2024-09-30 14:37:57 发布
网友
出于调试目的,我希望使用“raisedropitem”选项仅获取已在Scrapy(Python库)上删除的项
我想要这个列表,因为在清理过程中,有时页面包含HTML错误,我想将这些URL添加到我的蜘蛛黑名单中
监听item_dropped信号:
item_dropped
import scrapy import scrapy.signals from scrapy.crawler import CrawlerProcess class Spider(scrapy.Spider): name = 'spider' start_urls = ['http://example.com'] def parse(self, response): yield {'url': response.url} process = CrawlerProcess() def item_dropped(item, response, spider): print(results) process.crawl(Spider) for p in process.crawlers: p.signals.connect(item_dropped, signal=scrapy.signals.item_dropped) process.start()
监听
item_dropped
信号:相关问题 更多 >
编程相关推荐