json文件不是用Python Scrapy Spid创建的

/root nobel_winners scrapy.cfg /nobel_winners: __init__.py items.py pipelines.py spiders __pycache__ middlewares.py settings.py /nobel_winners/spiders: __init__.py __pycache__ nwinners_list_spider.py

#encoding:utf-8 import scrapy class NWinnerItem(scrapy.Item): country = scrapy.Field() class NWinnerSpider(scrapy.Spider): name = 'nwinners_list' allowed_domains = ['en.wikipedia.org'] start_urls = ["https://en.wikipedia.org/wiki/List_of_Nobel_laureates_by_country"] def parse(self, response): h2s = response.xpath('//h2') for h2 in h2s: country = h2.xpath('span[@class="mw-headline"]/text()').extract()

1条回答

网友

1楼 · 发布于 2024-09-30 18:23:26

尝试使用以下代码：

import scrapy

class NWinnerItem(scrapy.Item):
    country = scrapy.Field()

class NWinnerSpider(scrapy.Spider):
    name = 'nwinners_list'
    allowed_domains = ['en.wikipedia.org']
    start_urls = ["https://en.wikipedia.org/wiki/List_of_Nobel_laureates_by_country"]

    def parse(self, response):

        h2s = response.xpath('//h2')

        for h2 in h2s:
            yield NWinnerItem(
                country = h2.xpath('span[@class="mw-headline"]/text()').extract_first()
            )

然后跑 scrapy crawl nwinners_list -o nobel_winners.json -t json

在回调函数中，解析响应（网页）并返回包含提取的数据、项对象、请求对象的dict，或这些对象的iterable。See scrapy documentation

这就是为什么有0件物品被刮走的原因，你需要退货！你知道吗

还要注意，.extract()返回基于xpath查询的列表，.extract_first()返回列表的第一个元素。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章