在python中使用beauthoulsoup的输出

from BeautifulSoup import * from urllib import urlopen def parseWithSoup(url): print "Reading:" , url html = urlopen(url).read().lower() bs = BeautifulSoup(html) table = bs.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="tblt_table") rows = table.findAll(lambda tag: tag.name=='tr') rows.pop(0) #first row is header for row in rows: tags = row.findAll(lambda tag: tag.name=='a') content = [] for tagcontent in tags: content.append(tagcontent.string) print content if __name__ == '__main__': content = "http://www.teamliquid.net/tlpd/sc2-international/games#tblt-5018-1-1-DESC" metSoup = parseWithSoup(content)

2条回答

网友

1楼 · 编辑于 2024-09-19 23:27:03

您看到的是Python unicode字符串。在

查看Python文档

http://docs.python.org/howto/unicode.html

为了正确处理unicode字符串。在

网友

2楼 · 编辑于 2024-09-19 23:27:03

u表示Unicode字符串。作为一个程序员，它不会改变任何东西，你应该忽略它。像对待普通的琴弦一样对待他们。你真的想要这个u在那里。在

请注意，所有漂亮的Soup输出都是unicode。这是一件好事，因为如果在抓取过程中遇到任何Unicode字符，就不会有任何问题。如果您真的想去掉u（我不推荐），那么可以使用unicode字符串的decode()方法。在

相关问题更多 >

编程相关推荐

热门问题

热门文章