Python:httplib&requests;https的问题似乎会导致重定向,然后导致BadStatusLine异常

2024-05-18 08:34:52 发布

您现在位置:Python中文网/ 问答频道 /正文

从他们的网站上收集一些目前还没有的信息。不幸的是,我似乎无法通过urllib2httplib或{}连接到站点,而不会遇到BadStatusLine异常。在

我相信这是由于任何请求http://www.discogs.com被重定向到https://www.discogs.com。我已经能够通过使用以下代码来确定方向:

r_link = "http://www.discogs.com"
print "Trying " + r_link
r = requests.get(r_link, allow_redirects=False)
print(r.status_code, r.reason, r.history, r.headers['Location'])

这将返回:

^{pr2}$

如果我理解正确,这意味着对http://www.discogs.com的任何请求都将被重定向到https://www.discogs.com。因此,有人会认为,显而易见的解决办法是直接把自己的请求放在https://www.discogs.com上。不幸的是,使用上述代码(即将s添加到r\u链接路径中)会导致BadStatusCode错误。。。在

Trying https://www.discogs.com
Traceback (most recent call last):
  File "start.py", line 26, in <module>
    r = requests.get(r_link, allow_redirects=False)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/adapters.py", line 426, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))

requests文档中的示例来看,处理https链接应该没有问题。实际上,使用https://www.google.com尝试上述代码会导致302响应,并在使用r.headers['Location']中的url时成功重定向。在

有什么问题吗?为什么会这样?这是因为我犯了个错误吗?这是我的设备/设置特有的吗?这是discogs服务器特有的吗?我不知道如何诊断这个问题。在

谢谢。在


Tags: inpyhttpscomdiscogsrequestlibpackages
1条回答
网友
1楼 · 发布于 2024-05-18 08:34:52

添加用户代理,请求将正常工作:

h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
r_link = "https://www.discogs.com"
print ("Trying " + r_link)
r = requests.get(r_link,headers=h)
print(r.status_code, r.reason, r.history, r.headers)
print(r.content)

下面是一个工作示例:

^{pr2}$

如果要登录:

h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}


login = "https://www.discogs.com/login?return_to=%2F"
with requests.session() as s:
    r = s.post(login, data={"username":"your_user","password":"your_pass","Action.Login":""}, headers=h)
    print(r.content)

如果我们运行它,你会看到我们得到https://www.discogs.com/my

In [27]: h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}

In [28]: login = "https://www.discogs.com/login?return_to=%2F"

In [29]: with requests.session() as s:
   ....:         r = s.post(login, data={"username":"xxxxxxxx","password":"xxxxxxxx","Action.Login":""}, headers=h)
   ....:         print(r.url)
   ....:     
https://www.discogs.com/my

相关问题 更多 >