Python网络爬虫：连接定时ou问题的回答

Python网络爬虫：连接定时ou

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我试图实现一个简单的网络爬虫，我已经写了一个简单的代码开始：有两个模块抓取器.py和爬虫.py。以下是文件： 在抓取器.py公司名称： <pre><code> import urllib2 import re def fetcher(s): "fetch a web page from a url" try: req = urllib2.Request(s) urlResponse = urllib2.urlopen(req).read() except urllib2.URLError as e: print e.reason return p,q = s.split("//") d = q.split("/") fdes = open(d[0],"w+") fdes.write(str(urlResponse)) fdes.seek(0) return fdes if __name__ == "__main__": defaultSeed = "http://www.python.org" print fetcher(defaultSeed) </code></pre> 在爬虫.py以下内容： ^{pr2}$ 问题是当我跑爬虫.py对于前4-5个链接，它可以正常工作，然后会挂起，一分钟后会出现以下错误： <pre><code>[Errno 110] Connection timed out Traceback (most recent call last): File "crawler.py", line 37, in <module> crawler("http://www.python.org/",7) File "crawler.py", line 34, in crawler crawler(newSeed,n-1) File "crawler.py", line 34, in crawler crawler(newSeed,n-1) File "crawler.py", line 34, in crawler crawler(newSeed,n-1) File "crawler.py", line 34, in crawler crawler(newSeed,n-1) File "crawler.py", line 34, in crawler crawler(newSeed,n-1) File "crawler.py", line 33, in crawler newSeed = parse(fdes,newLinks.tell()) File "crawler.py", line 11, in parse soup = BeautifulSoup(fd) File "/usr/lib/python2.7/dist-packages/bs4/__init__.py", line 169, in __init__ self.builder.prepare_markup(markup, from_encoding)) File "/usr/lib/python2.7/dist-packages/bs4/builder/_lxml.py", line 68, in prepare_markup dammit = UnicodeDammit(markup, try_encodings, is_html=True) File "/usr/lib/python2.7/dist-packages/bs4/dammit.py", line 191, in __init__ self._detectEncoding(markup, is_html) File "/usr/lib/python2.7/dist-packages/bs4/dammit.py", line 362, in _detectEncoding xml_encoding_match = xml_encoding_re.match(xml_data) TypeError: expected string or buffer </code></pre> 有谁能帮我这个忙吗？我对python很陌生，我不知道为什么在一段时间后连接超时？在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

Python网络爬虫：连接定时ou

1 个回答

相关Python问题