Scrapy默认\u请求\u头不起作用

DEFAULT_REQUEST_HEADERS = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch', 'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4', }

# -*- coding: utf-8 -*- import scrapy class HotSpider(scrapy.Spider): name = "hot" allowed_domains = ["qiushibaike.com"] start_urls = ( 'http://www.qiushibaike.com/hot', ) def parse(self, response): print '\n', response.status, '\n'

# -*- coding: utf-8 -*- import scrapy class HotSpider(scrapy.Spider): name = "hot" allowed_domains = ["qiushibaike.com"] start_urls = ( 'http://www.qiushibaike.com/hot', ) headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, sdch', 'Accept-Language': 'en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4', } def make_requests_from_url(self, url): return scrapy.http.Request(url, headers=self.headers) def parse(self, response): print '\n', response.status, '\n'

1条回答

网友

1楼 · 发布于 2024-09-28 03:17:11

我看到，在使用默认标头中间件时，用户代理标头确实没有正确设置，并且这个特定的站点拒绝没有预期的用户代理标头的连接

为爬虫程序设置用户代理的推荐方法是使用用户代理设置密钥：

例如

# settings.py
USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36"

在使用默认标题时不设置用户代理可能是Scrapy中的一些错误，或者这可能是预期的，并在某处记录。您需要对此做更多的研究，如果它确实是一个bug，那么值得在Scrapy github repo中发布bug报告

相关问题更多 >

编程相关推荐

热门问题

热门文章