如何使用python获取给定url的原始html文本

import html2text import urllib2 proxy = urllib2.ProxyHandler({'http': 'http://<proxy>:<pass>@<ip>:<port>'}) auth = urllib2.HTTPBasicAuthHandler() opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler) urllib2.install_opener(opener) html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read() print html2text.html2text(html)

Traceback (most recent call last): File "t.py", line 8, in <module> html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read() File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 404, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 422, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open raise URLError(err) urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

1条回答

网友

1楼 · 发布于 2024-06-26 13:29:10

如果您不需要SSL，那么Python 2.7.x中的这个脚本应该可以工作：

import urllib
url = "http://stackoverflow.com"
f = urllib.urlopen(url)
print f.read()

在Python 3.x中使用urllib.request，而不是urllib

因为urllib2对于Python 2，在Python 3中它被合并到urllib。

http://是必需的。

相关问题更多 >

编程相关推荐

热门问题

热门文章