如何在beautiful soup中使用get_text（）时更改unicode格式

import requests from pattern import web from bs4 import BeautifulSoup from pandas import * url = 'http://www.mouthshut.com/product-reviews/amazonin-reviews-925670774-srch' r = requests.get(url) bs = BeautifulSoup(r.text) mouthrev = [] Title = [] for revlist in bs.find_all("li","reviewdetails openshare"): title = revlist.find_all('div','reviewtitle fl') title = [g.get_text(strip=True) for g in title] for parent in revlist.find_all("div", itemprop='description'): review = parent.find_all('p') review = [g.get_text(strip=True) for g in review] mouthrev.append(review) Title.append(title) mouth1 = DataFrame({'Title' : Series(Title),'Review' : Series(mouthrev)}) mouth1.to_csv('D:\\Review.csv')

2条回答

网友

1楼 · 编辑于 2024-10-01 02:18:36

这与Unicode无关。[...]是列表的表示（repr）。每个单元格中都有一个列表，因为您要获取多个p元素的文本：

title = [g.get_text(strip=True) for g in title]
review = [g.get_text(strip=True) for g in review]

如果您想从中形成一个字符串，您可以将多个p文本合并为一行，例如：

^{pr2}$

然后CSV格式化程序将有一个字符串而不是一个列表，因此它不必尝试通过repr将数据强制为字符串。在

网友

2楼 · 编辑于 2024-10-01 02:18:36

如果我理解正确，为什么不使用str（）

review = [str(g.get_text(strip=True)) for g in review]

这会有用的

相关问题更多 >

编程相关推荐

热门问题

热门文章