为什么当我在str中传递一个参数时

2024-10-04 03:20:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个接收str值的函数,但当我执行错误时,会说这是一个字节值:

 Traceback (most recent call last):
  File "C:\Users\sdand\Documents\Python\Engine\engine.py", line 4, in <module>
    print (find.crawl_web('https://google.com',4))
  File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 68, in crawl_web
    links = self.get_all_links(content)
  File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 20, in get_all_links
    url, endpos = self.get_next_target(page)
  File "C:\Users\sdand\Documents\Python\Engine\finder.py", line 7, in get_next_target
    start_link = s.find('<a href=')
TypeError: a bytes-like object is required, not 'str'

这是我调用get\u all\u links的函数:

def crawl_web(self,seed, max_depth):
        tocrawl = [seed]
        crawled = []
        next_depth = []
        depth = 0
        index=[]

        while tocrawl and depth <= max_depth:
            page = tocrawl.pop()
            if page not in crawled:
                #here content content is str
                content = self.get_page(page)
                self.add_page_to_index(index,page,content)
                links = self.get_all_links(content)
                self.union(next_depth,links)
                crawled.append(page)
            if not tocrawl:
                tocrawl, next_depth = next_depth, []
                depth = depth+1
        return index

这是获取页面:

def get_page(self,url):
        try:
            import urllib.request

            return  urllib.request.urlopen(url).read()
        except:
            return ""

这是获取所有链接:

def get_all_links(self,page):
        #but here it is byte i dont now why
        links=[]
        while True:
            url, endpos = self.get_next_target(page)
            print(url)
            if url != None:
                links.append(url)
                page = page[endpos:]
            else:
                break
        return links

我不知道为什么我的str变量“Content”在get\u all\u links中被转换成byte类型,有人可以向我解释,我如何解决它?你知道吗


Tags: inselfurlgetpagelinkscontentall
1条回答
网友
1楼 · 发布于 2024-10-04 03:20:09

您可能不知道,.read()返回的是一个byte对象,而不是str,尽管在web抓取时更建议使用byte对象,但最简单的修复方法是通过解码将其转换为str。你知道吗

return urllib.request.urlopen(url).read().decode('utf-8')

相关问题 更多 >