擅长:python、mysql、java
<p>您可能需要考虑使用<a href="http://codespeak.net/lxml/" rel="nofollow">lxml</a>而不是BeautifulSoup。
lxml允许您通过xpath查找元素:</p>
<p>使用此锅炉板设置:</p>
<pre><code>import lxml.html as LH
import re
html = """
<p>
If everybody minded their own business, the world would go around a great deal faster than it does.
</p>
<p>
Who in the world am I? Ah, that's the great puzzle.
</p>
"""
doc = LH.fromstring(html)
</code></pre>
<p>这将查找包含字符串<code>world</code>的所有<code><p></code>标记中的文本:</p>
^{pr2}$
<p>这将查找包含<code>world</code>和{<cd5>}的所有<code><p></code>标记中的所有文本:</p>
<pre><code>print(doc.xpath('//p[contains(text(),"world") and contains(text(),"puzzle")]/text()'))
["\nWho in the world am I? Ah, that's the great puzzle.\n"]
</code></pre>