Scrapy errback无法捕获所有错误

2024-09-30 08:19:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图捕获所有可能的错误,并将页面保存为“不健康”。问题是没有将所有错误发送到parseerrback。他们中的一些人刚刚迷路

我在产生请求时设置了dont_filter=True,并且errback=self.error

def error(self, failure):

    # log all failures
    meta = failure.request.meta
    website = meta['website']

    if failure.check(HttpError):
        # these exceptions come from HttpError spider middleware
        # you can get the non-200 response
        response = failure.value.response
        website.set_response_code(response.status, save=False)

    elif failure.check(DNSLookupError):
        website.set_response_code(WebSite.RESPONSE_CODE__DNS_LOOKUP_ERROR, save=False)

    elif failure.check(TimeoutError, TCPTimedOutError):
        website.set_response_code(WebSite.RESPONSE_CODE__TIMEOUT, save=False)
    else:
        website.set_response_code(WebSite.RESPONSE_CODE__UNKNOWN, save=False)

你知道怎么做吗

编辑

甚至一些超时错误,如:

2020-09-07 11:27:09 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.example.com/>
Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request
    return (yield download_func(request=request, spider=spider))
  File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/ubuntu/.virtualenvs/newswd/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 374, in _cb_timeout
    raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.example.com/ took longer than 30.0 seconds..

Tags: inpyselfhomefailureresponseubuntulib

热门问题