擅长:python、mysql、java
<p>这里有一个可选的<a href="/questions/tagged/lxml" class="post-tag" title="show questions tagged 'lxml'" rel="tag">lxml</a>及其<code>xpath</code>函数来搜索所有文本节点:</p>
<pre><code>from lxml import etree
html = '''<div class="thisText">
Poem <a href="http://famouspoetsandpoems.com/poets/edgar_allan_poe/poems/18848">The Raven</a>Once upon a midnight dreary, while I pondered, weak and weary... </div>
<div class="thisText">
In the greenest of our valleys By good angels tenanted..., part of<a href="http://famouspoetsandpoems.com/poets/edgar_allan_poe/poems/18848">The Haunted Palace</a>
</div>'''
root = etree.fromstring(html, etree.HTMLParser())
print(' '.join(root.xpath("//text()")))
</code></pre>
<p>它产生:</p>
^{pr2}$