<p>不要使用re模块,而是查看<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="nofollow noreferrer">bs4</a>库。你知道吗</p>
<p>这是一个XML/HTML解析器,因此您可以获得标记之间的所有内容。你知道吗</p>
<p>对你来说,会是这样的:</p>
<pre><code>from bs4 import BeautifulSoup
xml_text = '< S sid ="2" ssid = "2">It differs from previous machine learning-based NERs in that it uses information from the whole document to classify each word, with just one classifier.< /S>< S sid ="3" ssid = "3">Previous work that involves the gathering of information from the whole document often uses a secondary classifier, which corrects the mistakes of a primary sentence- based classifier.< /S>'
text_soup = BeautifulSoup(xml_text, 'lxml')
output = text_soup.find_all('S', attrs = {'sid': '2'})
</code></pre>
<p>输出将包含文本:</p>
<blockquote>
<p>It differs from previous machine learning-based NERs in that it uses information from the whole document to classify each word, with just one classifier.</p>
</blockquote>
<p>此外,如果您只想删除html标记:</p>
<pre><code>import re
xml_text = '< S sid ="2" ssid = "2">It differs from previous machine learning-based NERs in that it uses information from the whole document to classify each word, with just one classifier.< /S>< S sid ="3" ssid = "3">Previous work that involves the gathering of information from the whole document often uses a secondary classifier, which corrects the mistakes of a primary sentence- based classifier.< /S>'
re.sub('<.*?>', '', html_text)
</code></pre>
<p>我会做的。你知道吗</p>