我无法获得下面的StackOverflow答案,因为我正在运行多个spider,并且依赖于CrawlerProcess
而不是Crawler
。你知道吗
How to get stats from a scrapy run?
我想用get_stats()
之类的东西访问这两次运行的统计信息,但无法确定哪个对象将具有get_stats()
属性。非常感谢您的帮助。你知道吗
import scrapy
from scrapy.crawler import CrawlerProcess
class QuotesSpider(scrapy.Spider):
name = "quotes"
def parse(self, response):
yield {
'name': response.css('small.author::text').extract_first()
}
class QuotesSpider1(QuotesSpider):
name = "quotes1"
start_urls = ['http://quotes.toscrape.com/page/1/']
class QuotesSpider2(QuotesSpider):
name = "quotes2"
start_urls = ['http://quotes.toscrape.com/page/2/']
if __name__ == "__main__":
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
'FEED_FORMAT': 'jsonlines',
'FEED_URI': 'result.jl',
})
process.crawl(QuotesSpider1)
process.crawl(QuotesSpider2)
process.start()
目前没有回答
相关问题 更多 >
编程相关推荐