Scrapy：如何从其他python脚本运行spider两次或更多

# interface def search(keyword): configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'}) runner = CrawlerRunner() d = runner.crawl(JingdongSpider,keyword) d.addBoth(lambda _: reactor.stop()) reactor.run() # the script will block here until the crawling is finished

2条回答

网友

1楼 · 编辑于 2024-07-04 13:51:47

在代码示例中，调用扭曲电抗器在每次函数调用时启动它。这不起作用，因为每个进程只有一个反应器，而您不能start it twice。在

有两种解决问题的方法，都在documentation here中描述。要么坚持使用CrawlerRunner，但将reactor.run()移到search()函数之外，以确保它只被调用一次。或者使用CrawlerProcess并简单地调用crawler_process.start()。第二种方法更简单，您的代码如下所示：

from scrapy.crawler import CrawlerProcess
from dirbot.spiders.dmoz import DmozSpider

def search(runner, keyword):
    return runner.crawl(DmozSpider, keyword)

runner = CrawlerProcess()
search(runner, "alfa")
search(runner, "beta")
runner.start()

网友

2楼 · 编辑于 2024-07-04 13:51:47

正如Pawel Miech所说

In your code sample you are making calls to twisted.reactor starting it on every function call. This is not working because there is only one reactor per process and you cannot start it twice.

我找到了解决问题的方法。只是使用多重处理。在

就像：

from multiprocessing import Process
def run_spider(keyword):
    if __name__ == '__main__':
        p = Process(target=jingdong_spider.search, args=(keyword.encode('utf-8'),))
        p.start()
        p.join()

如果每个人在使用python多处理时都有问题。最好看一下python文档。在

相关问题更多 >

编程相关推荐

热门问题

热门文章