美味的汤4无法正确打印。Python3

import urllib import lxml from urllib import request from bs4 import BeautifulSoup data = urllib.request.urlopen('www.site.com').read() soup = BeautifulSoup(data, 'lxml') stat = soup.find('div', {'style' : 'padding-left: 10px';}) dialog = stat.findChildren('p') for child in dialog: childtext = child.get_text() #have tried child.string aswell (exactly the same result) childlist.append(childtext.encode('utf-8', 'ignore') #Have tried with str(childtext.encode('utf-8', 'ignore')) print (childlist)

1条回答

网友

1楼 · 发布于 2024-10-03 15:34:36

首先，您将得到形式b'stuff'的输出，因为您正在调用.encode()，它返回一个bytes对象。如果要打印字符串以供阅读，请将它们保留为字符串！你知道吗

作为猜测，我假设您希望很好地打印HTML中的字符串，就像在浏览器中看到的一样。为此，您需要对HTML字符串编码进行解码，如this SO answer中所述，这对于Python3.5意味着：

import html
html.unescape(childtext)

除此之外，这将把HTML字符串中的任何 序列转换成'\xa0'字符，这些字符作为空格打印。但是，如果您想在这些字符上断行，尽管 字面意思是“不间断空格”，您必须在打印之前用实际空格替换这些字符，例如使用x.replace('\xa0', ' ')。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章