不符合Python变量限制的零碎字段名

1条回答

网友

1楼 · 发布于 2024-06-01 10:05:06

我建议您使用名为scrapy-jsonschema的第三方库。有了它，您可以像这样定义项目：

from scrapy_jsonschema.item import JsonSchemaItem

class MyItem(JsonSchemaItem):
    jsonschema = {
        "$schema": "http://json-schema.org/draft-04/schema#",
        "title": "MyItem",
        "description": "My Item with spaces",
        "type": "object",
        "properties": {
            "id": {
                "description": "The unique identifier for the employee",
                "type": "integer"
            },
            "name": {
                "description": "Name of the employee",
                "type": "string"
            },
            "job title": {
                "description": "The title of employee's job.",
                "type": "string",

            }
        },
        "required": ["id", "name", "job title"]
    }

然后像这样填充它：

item = MyItem()
item['job title'] = 'Boss'

您可以阅读有关here的更多信息

此解决方案按照您的要求解决项目定义问题，但您可以在不定义项目的情况下获得类似的结果。例如，您可以将数据刮取到dict中，然后将其返回给scrapy

yield {
    "id": response.xpath('...').get(),
    "name": response.xpath('...').get(),
    "job title": response.xpath('...').get(),
}

使用scrapy crawl myspider -o file.csv将其刮入csv，列将具有您选择的名称

您还可以让爬行器直接写入csv，或者它的管道，等等。有几种方法可以在没有项目定义的情况下完成

相关问题更多 >

编程相关推荐

热门问题

热门文章

不符合Python变量限制的零碎字段名

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >