我试图捕获所有可能的错误,并将页面保存为“不健康”。问题是没有将所有错误发送到parse
或errback
。他们中的一些人刚刚迷路
我在产生请求时设置了dont_filter=True
,并且errback=self.error
def error(self, failure):
# log all failures
meta = failure.request.meta
website = meta['website']
if failure.check(HttpError):
# these exceptions come from HttpError spider middleware
# you can get the non-200 response
response = failure.value.response
website.set_response_code(response.status, save=False)
elif failure.check(DNSLookupError):
website.set_response_code(WebSite.RESPONSE_CODE__DNS_LOOKUP_ERROR, save=False)
elif failure.check(TimeoutError, TCPTimedOutError):
website.set_response_code(WebSite.RESPONSE_CODE__TIMEOUT, save=False)
else:
website.set_response_code(WebSite.RESPONSE_CODE__UNKNOWN, save=False)
你知道怎么做吗
编辑
甚至一些超时错误,如:
2020-09-07 11:27:09 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.example.com/>
Traceback (most recent call last):
File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
return (yield download_func(request=request, spider=spider))
File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 374, in _cb_timeout
raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.example.com/ took longer than 30.0 seconds..
目前没有回答
相关问题 更多 >
编程相关推荐