python scrapy parse（）函数，返回值返回到哪里？

import scrapy from myproject.items import MyItem class MySpider(scrapy.Spider): name = ’example.com’ allowed_domains = [’example.com’] start_urls = [ ’http://www.example.com/1.html’, ’http://www.example.com/2.html’, ’http://www.example.com/3.html’, ] def parse(self, response): for h3 in response.xpath(’//h3’).extract(): yield MyItem(title=h3) for url in response.xpath(’//a/@href’).extract(): yield scrapy.Request(url, callback=self.parse)

1条回答

网友

1楼 · 发布于 2024-09-21 11:35:59

根据documentation：

The parse() method is in charge of processing the response and returning scraped data (as Item objects) and more URLs to follow (as Request objects).

换言之，返回/产生的项和请求的处理方式不同，项被传递给项管道和项导出器，但请求被放入Scheduler中，该Downloader通过管道将请求传递给Downloader以发出请求并返回响应。然后，引擎接收响应并将其交给spider处理（给callback方法）。

整个数据流过程以非常详细的方式在Architecture Overview页中描述。

希望能有所帮助。

相关问题更多 >

编程相关推荐

热门问题

热门文章