我对编码完全陌生。我试图创建一个程序来为我收集数据,但是当我编写代码打开url时,它显示HTTPError:HTTP Error 405:不允许我使用Python,我安装了Beautiful Soup,但是由于某种原因我得到了这个错误?我试过不同的标题,但没用。下面是代码。在
from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request
import re
import numpy as np
# Opening the Builder website
html = "http://www.builderonline.com"
req = urllib.request.Request(html,headers={'User-Agent' : "Mozilla/5.0"})
soup = BeautifulSoup(urlopen(req).read(),"html.parser")
print ("end")
Error Messages:
Traceback (most recent call last):
File "test3.py", line 9, in <module>
soup = BeautifulSoup(urlopen(req).read(),"html.parser")
File "/Users/NAGS/anaconda/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/Users/NAGS/anaconda/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/Users/NAGS/anaconda/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/Users/NAGS/anaconda/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/Users/NAGS/anaconda/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/Users/NAGS/anaconda/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 405: Not Allowed
这个页面有验证码和用户没有JavaScript保护。请尝试以下代码:
如果您想从网站上获取数据,我建议您使用Selenium与PhantomJS一起使用Selenium>(无头浏览器)。在
对于错误405:
很棒的教程HERE
使用请求和BeautifulGroup我可以很容易地抓取列表标签:
以及输出(使用
^{pr2}$pprint
):也许这和你的标题格式有关。可能站点设置为检查格式错误或不完整的标头。尝试使用浏览器转到https://httpbin.org/headers,并使用脚本中列出的用户代理数据。在
相关问题 更多 >
编程相关推荐