使用谷歌的IP而不是域名时出现的“TooManyRedirects”问题

2024-10-01 17:40:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试抓取谷歌搜索结果,当我使用如下域名时,一切都很好:

import requests
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
requests.get('https://google.com/search?q={}'.format('movie'),\
    verify=False, headers={'User-Agent': user_agent})

但当我使用IP抓取谷歌时:

requests.get('https://216.58.207.78/search?q={}'.format('movie'),\
    verify=False, headers={'User-Agent': user_agent, 'host': 'google.com'})

出现以下错误:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/mohammad/myfiles/gitRepo/telesearch/env/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
  File "/home/mohammad/myfiles/gitRepo/telesearch/env/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
  File "/home/mohammad/myfiles/gitRepo/telesearch/env/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
  File "/home/mohammad/myfiles/gitRepo/telesearch/env/lib/python3.6/site-packages/requests/sessions.py", line 668, in send
history = [resp for resp in gen] if allow_redirects else []
  File "/home/mohammad/myfiles/gitRepo/telesearch/env/lib/python3.6/site-packages/requests/sessions.py", line 668, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
  File "/home/mohammad/myfiles/gitRepo/telesearch/env/lib/python3.6/site-packages/requests/sessions.py", line 165, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

我怎样才能修好它


Tags: inpyenvhomelibpackageslinesite
1条回答
网友
1楼 · 发布于 2024-10-01 17:40:48

通过将www.添加到Host来修复它:

requests.get('https://216.58.207.78/search?q={}'.format('movie'),\
    verify=False, headers={'User-Agent': user_agent, 'host': 'www.google.com'})

说明

发生这种情况是因为您在HostHTTP头中使用了google.com

当google收到您的请求时,它发现您的HTTP头中需要google.com,所以他们会将您重定向到www.google.com。但是当请求跟随重定向时,它会发送与您请求的相同的头,在Host中包含google.com。因此服务器会再次重定向您,以此类推

您还可以删除Host头,就我所见,这没有什么区别

相关问题 更多 >

    热门问题