使用urllib2避免503个错误

import sys # Used to add the BeautifulSoup folder the import path import urllib2 # Used to read the html document if __name__ == "__main__": ### Import Beautiful Soup ### Here, I have the BeautifulSoup folder in the level of this Python script ### So I need to tell Python where to look. sys.path.append("./BeautifulSoup") from BeautifulSoup import BeautifulSoup ### Create opener with Google-friendly user agent opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] ### Open page & generate soup ### the "start" variable will be used to iterate through 10 pages. for start in range(0,10): url = "http://www.google.com/search?q=site:stackoverflow.com&start=" + str(start*10) page = opener.open(url) soup = BeautifulSoup(page) ### Parse and find ### Looks like google contains URLs in <cite> tags. ### So for each cite tag on each page (10), print its contents (url) for cite in soup.findAll('cite'): print cite.text

2条回答

网友

1楼 · 编辑于 2024-06-26 11:01:19

正如埃托雷所说，删除搜索结果是违反我们的任务规定的。但是，请查看websearchapi，特别是documentation的底部部分，它应该会告诉您如何从非javascip环境访问api。在

网友

2楼 · 编辑于 2024-06-26 11:01:19

谷歌服务条款不允许自动查询。有关信息，请参阅本文： Unusual traffic from your computer 还有Google Terms of service

相关问题更多 >

编程相关推荐

热门问题

热门文章