from BeautifulSoup import BeautifulSoup
html = '''<div class="thisText">
Poem <a href="http://famouspoetsandpoems.com/poets/edgar_allan_poe/poems/18848">The Raven</a>Once upon a midnight dreary, while I pondered, weak and weary... </div>
<div class="thisText">
In the greenest of our valleys By good angels tenanted..., part of<a href="http://famouspoetsandpoems.com/poets/edgar_allan_poe/poems/18848">The Haunted Palace</a>
</div>'''
soup = BeautifulSoup(html)
all_poems = soup.findAll("div", {"class": "thisText"})
for poems in all_poems:
print(poems.text)
我有这个示例代码,但我无法找到如何在删除的标记周围添加空格,这样当<a href...>
中的文本格式化时,它可以阅读,并且不会像这样显示:
PoemThe RavenOnce upon a midnight dreary, while I pondered, weak and weary...
In the greenest of our valleys By good angels tenanted..., part ofThe Haunted Palace
这里有一个可选的lxml及其
xpath
函数来搜索所有文本节点:它产生:
^{pr2}$一种选择是查找所有文本节点并用空格将它们连接起来:
另外,您使用的是} :
^{pr2}$beautifulsoup3
包,该包已过时且未维护。升级到^{并替换:
有:
beautifoulsoup4
中的get_text()
有一个名为separator
的可选输入。您可以按如下方式使用它:相关问题 更多 >
编程相关推荐