如何从BeautifulSoup get文本方法中去掉换行符

2024-05-19 10:29:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我在抓取网页后有如下输出

       text
Out[50]: 
['\nAbsolute FreeBSD, 2nd Edition\n',
'\nAbsolute OpenBSD, 2nd Edition\n',
'\nAndroid Security Internals\n',
'\nApple Confidential 2.0\n',
'\nArduino Playground\n',
'\nArduino Project Handbook\n',
'\nArduino Workshop\n',
'\nArt of Assembly Language, 2nd Edition\n',
'\nArt of Debugging\n',
'\nArt of Interactive Design\n',]

我需要在遍历列表时从上面的列表中删除。以下是我的代码

text = []
for name in web_text:
   a = name.get_text()
   text.append(a)

Tags: oftextname网页列表outsecurityfreebsd
3条回答

你可以使用列表理解:

stripedText = [ t.strip() for t in text ]

哪些输出:

>>> stripedText
['Absolute FreeBSD, 2nd Edition', 'Absolute OpenBSD, 2nd Edition', 'Android Security Internals', 'Apple Confidential 2.0', 'Arduino Playground', 'Arduino Project Handbook', 'Arduino Workshop', 'Art of Assembly Language, 2nd Edition', 'Art of Debugging', 'Art of Interactive Design']

就像你会^{}任何其他字符串一样:

text = []
for name in web_text:
   a = name.get_text().strip()
   text.append(a)

与其显式调用.strip(),不如使用strip参数:

a = name.get_text(strip=True)

这也会删除子文本中多余的空格和换行符(如果有的话)。

相关问题 更多 >

    热门问题