尝试使用Scrapy和Splash来刮取JS页面时出错

2024-05-19 14:44:12 发布

您现在位置:Python中文网/ 问答频道 /正文

不过,我一直在讨论这个问题。在

 2018-09-13 14:50:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
 2018-09-13 14:50:36 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6028
 2018-09-13 14:50:37 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
 2018-09-13 14:50:38 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://localhost:8050/robots.txt> (referer: None)
 2018-09-13 14:51:10 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/js/ via http://localhost:8050/render.html> (failed 1 times): 504 Gateway Time-out
 2018-09-13 14:51:36 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
 2018-09-13 14:51:40 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://quotes.toscrape.com/js/ via http://localhost:8050/render.html> (failed 2 times): 504 Gateway Time-out
 2018-09-13 14:52:00 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://quotes.toscrape.com/js/ via http://localhost:8050/render.html> (failed 3 times): 502 Bad Gateway
 2018-09-13 14:52:00 [scrapy.core.engine] DEBUG: Crawled (502) <GET http://quotes.toscrape.com/js/ via http://localhost:8050/render.html> (referer: None)
 2018-09-13 14:52:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 http://quotes.toscrape.com/js/>: HTTP status code is not handled or not allowed

这是我的代码:

^{pr2}$

我已经安装了scrapy splash,我也把这些命令 在设置.py. 我的splash服务器也在运行 http://localhost:8050/。在

另外,当我试图在splash服务器上呈现任何url时,我收到了另一个错误:

HTTP Error 400 (Bad Request) Type: ScriptError -> LUA_ERROR Error happened while executing Lua script

Lua error: [string "function main(splash, args) ..."]:2: network3

我正在使用:

  • Splash版本:3.2

  • Lua 5.2


Tags: debugcomlocalhosthttpgetjsitemspages