Scrapy的Scrapy对调度spider太慢了

2024-05-19 10:30:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在运行Scrapyd,当同时启动4个spider时遇到了一个奇怪的问题。在

2012-02-06 15:27:17+0100 [HTTPChannel,0,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:17+0100 [HTTPChannel,1,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:17+0100 [HTTPChannel,2,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:17+0100 [HTTPChannel,3,127.0.0.1] 127.0.0.1 - - [06/Feb/2012:14:27:16 +0000] "POST /schedule.json HTTP/1.1" 200 62 "-" "python-requests/0.10.1"
2012-02-06 15:27:18+0100 [Launcher] Process started: project='thz' spider='spider_1' job='abb6b62650ce11e19123c8bcc8cc6233' pid=2545 
2012-02-06 15:27:19+0100 [Launcher] Process finished: project='thz' spider='spider_1' job='abb6b62650ce11e19123c8bcc8cc6233' pid=2545 
2012-02-06 15:27:23+0100 [Launcher] Process started: project='thz' spider='spider_2' job='abb72f8e50ce11e19123c8bcc8cc6233' pid=2546 
2012-02-06 15:27:24+0100 [Launcher] Process finished: project='thz' spider='spider_2' job='abb72f8e50ce11e19123c8bcc8cc6233' pid=2546 
2012-02-06 15:27:28+0100 [Launcher] Process started: project='thz' spider='spider_3' job='abb76f6250ce11e19123c8bcc8cc6233' pid=2547 
2012-02-06 15:27:29+0100 [Launcher] Process finished: project='thz' spider='spider_3' job='abb76f6250ce11e19123c8bcc8cc6233' pid=2547 
2012-02-06 15:27:33+0100 [Launcher] Process started: project='thz' spider='spider_4' job='abb7bb8e50ce11e19123c8bcc8cc6233' pid=2549 
2012-02-06 15:27:35+0100 [Launcher] Process finished: project='thz' spider='spider_4' job='abb7bb8e50ce11e19123c8bcc8cc6233' pid=2549 

我已经为Scrapyd设置了这些设置:

^{pr2}$

为什么Scrapyd不能像预定的那样同时运行蜘蛛呢?在


Tags: projectjsonhttpjobpostrequestsprocesspid
2条回答

我通过编辑scrapyd解决了这个问题/应用程序副本在30号线。在

timer = TimerService(5, poller.poll)更改为timer = TimerService(0.1, poller.poll)

编辑: AliBZ下面关于配置设置的注释是更改轮询频率的更好方法。在

根据我对scrapyd的经验,它不会像你安排的那样立即运行一个spider。它通常会等待一段时间,直到当前spider启动并运行,然后开始下一个spider进程(scrapy crawl)。在

因此,scrapyd逐个启动进程,直到达到max_proccount。在

从你的日志中我看到你的每只蜘蛛都在运行大约1秒。我想,如果你的蜘蛛至少跑30秒,你就会看到它们在奔跑。在

相关问题 更多 >

    热门问题