如何在垃圾下载程序mid中获取响应体

class ManualRetryMiddleware(RetryMiddleware): def process_response(self, request, response, spider): if not spider.retry_if_not_found: return response if not hasattr(response, 'text') and response.status != 200: return super(ManualRetryMiddleware, self).process_response(request, response, spider) found = False for xpath in spider.retry_if_not_found: if response.xpath(xpath).extract(): found = True break if not found: return self._retry(request, "Didn't find anything useful", spider) return response

2条回答

网友

1楼 · 编辑于 2024-10-01 04:51:32

还要注意中间件的位置。它必须在scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware之前，否则，您可能会尝试解码压缩数据（这确实不起作用）。检查响应.标题要知道响应是否被压缩-Content-Encoding: gzip。在

网友

2楼 · 编辑于 2024-10-01 04:51:32

response不包含xpath方法的原因是下载中间件的process_response方法中的response参数属于{a1}类型，参见documentation。只有^{}（和^{}）有xpath方法。所以在使用xpath之前，从response创建{}对象。相应的部分将变成：

...
new_response = scrapy.http.HtmlResponse(response.url, body=response.body)
if new_response.xpath(xpath).extract():
    found = True
    break
...

相关问题更多 >

编程相关推荐

热门问题

热门文章