Web爬虫类不工作

import re, urllib class WebCrawler: """A Simple Web Crawler That Is Readily Extensible""" def __init__(): size = 1 def containsAny(seq, aset): for c in seq: if c in aset: return True return False def crawlUrls(url, depth): textfile = file('UrlMap.txt', 'wt') urlList = [url] size = 1 for i in range(depth): for ee in range(size): if containsAny(urlList[ee], "http://"): try: webpage = urllib.urlopen(urlList[ee]).read() break except: print "Following URL failed!" print urlList[ee] for ee in re.findall('''href=["'](.[^"']+)["']''',webpage, re.I): print ee urlList.append(ee) size+=1 textfile.write(ee+'\n') myCrawler = WebCrawler myCrawler.crawlUrls("http://www.wordsmakeworlds.com/", 2)

Traceback (most recent call last): File "C:/Users/Noah Huber-Feely/Desktop/Python/WebCrawlerClass", line 33, in <module> myCrawler.crawlUrls("http://www.wordsmakeworlds.com/", 2) TypeError: unbound method crawlUrls() must be called with WebCrawler instance as first argument (got str instance instead)

1条回答

网友

1楼 · 发布于 2024-09-27 07:28:44

你有两个问题。一个是这一行：

myCrawler = WebCrawler

您没有创建WebCrawler的实例，只是将名称myCrawler绑定到WebCrawler（基本上，为类创建一个别名）。您应该这样做：

myCrawler = WebCrawler()

然后，在这条线上：

def crawlUrls(url, depth):

Python实例方法将接收器作为方法的第一个参数。它通常被称为self，但从技术上讲，你可以随便叫它什么。因此，您应该将方法签名更改为：

def crawlUrls(self, url, depth):

（对于定义的其他方法，也需要这样做。）

相关问题更多 >

编程相关推荐

热门问题

热门文章