TypeError:多线程web爬虫程序中不可调用“str”对象

2024-10-01 22:34:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试编写一个python web爬虫程序,并使其具有多线程功能。我遇到的主要问题是使用ThreadPoolExecutor库并发运行代码

def crawl(self, url):
    for link in self.get_links(url):
      if link in self.visited:
        continue
      print("Scraping URL: {}".format(link))
      #if not visited add to visited set O(1) time
      self.visited.add(link)
      info = self.extract_info(link)
      return("word")

我的爬网函数只想返回一些字符串

我有一个启动功能,可以启动最多2名工人的游泳池:

  def start(self):
    job = self.pool.submit(self.crawl(self.startingUrl))
    job.add_done_callback(self.appendText)

问题出现在appendText函数中,我试图将future对象转换回字符串以将字符串写入文件:

def appendText(self,res):
    print("HELLO!")
    print("res = ", res.result())

    with open("Crawled.txt","w") as file:
      des = "Description: {}".format(res.result())
      key = "Keywords:{}".format(res.result())
      file.write(des)
      file.write(key)

我最终得到了一个TypeError,并且一直在寻找如何将future对象转换为字符串的方法

HELLO!                                                                               tures
Traceback (most recent call last):
  File "crawler/crawler.py", line 78, in <module>
    crawler.start()
  File "crawler/crawler.py", line 73, in start
    job.add_done_callback(self.appendText)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/fu
tures/_base.py", line 403, in add_done_callback
    fn(self)
  File "crawler/crawler.py", line 53, in appendText
    print("res = ", res.result())
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/fu
tures/_base.py", line 425, in result
    return self.__get_result()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/fu
tures/_base.py", line 384, in __get_result
    raise self._exception
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/fu
tures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
TypeError: 'str' object is not callable

我在这方面哪里出错了?谢谢大家!


Tags: 字符串inpyselfaddlinelinkres
1条回答
网友
1楼 · 发布于 2024-10-01 22:34:13

crawl返回一个字符串。现在如何使用代码,您将调用crawl然后给出返回给submit的字符串。然后,池将尝试执行作为函数提供给它的字符串,从而导致错误

您希望将未调用的函数传递给submit,并让它为您调用crawl

self.pool.submit(target=self.crawl, args=(self.startingUrl,))

target是您希望它调用的函数,args是您希望它调用函数的参数

您也可以使用大致相同的方法:

self.pool.submit(target=lambda: self.crawl(self.startingUrl))

通过将其包装在lambda中,可以执行dekat。尽管lambda有一些开销,但还是更喜欢第一种方法。我把它包括在这里作为参考

相关问题 更多 >

    热门问题