擅长:python、mysql、java
<p>另一种方法是使用HTML解析器解析字符串,比如<a href="http://lxml.de" rel="nofollow noreferrer">^{<cd1>}</a>。在</p>
<p>例如,通过检查<code>preceding</code>和<code>following</code>兄弟姐妹,可以使用xpath查找<code>b</code>标记和{<cd4>}标记之间的所有内容:</p>
<pre><code>from lxml.html import fromstring
l = [
"""<b>Carson Daly</b>: <a href="http://rads.stackoverflow.com/amzn/click/B009DA74O8">Ben Schwartz</a>, Soko, Jacob Escobedo (R 2/28/14)<br>'""",
"""<b>Carson Daly</b>: Wil Wheaton, the Birds of Satan, Courtney Kemp Agboh<br>"""
]
for html in l:
tree = fromstring(html)
results = ''
for element in tree.xpath('//node()[preceding-sibling::b="Carson Daly" and following-sibling::br]'):
if not isinstance(element, str):
results += element.text.strip()
else:
text = element.strip(':')
if text:
results += text.strip()
print results.split(', ')
</code></pre>
<p>它打印:</p>
^{pr2}$