<p>这不是一个恰当的答案。你知道吗</p>
<p>我要提到的是,用regex解析HTML通常会给生活带来不必要的困难。最好使用诸如BeautifulSoup、lxml、scrapy等解析器</p>
<p>从你作为例子提供的每一行中恢复文本是很容易的。我假设每一个都是一个更大的构造的一部分;因此我将每个都包含在一个<code>div</code>中。你知道吗</p>
<p>在这里,我使用BeautifulSoup从您的每一行中获取文本。你知道吗</p>
<pre><code>>>> for line in open('temp.htm').readlines():
... line = line.strip()
... print(line)
... soup = bs4.BeautifulSoup(line, 'lxml')
... soup.find('div').text
...
<div>Created and <div style="font-size: 1">managed</div> websites for clients to communicate securely</div>
'Created and managed websites for clients to communicate securely'
<div>Created and <div style="font-size: 2">managed websites</div> for clients to communicate securely</div>
'Created and managed websites for clients to communicate securely'
<div>Created and managed websites for clients to <div style="font-size: 3">communicate</div> securely</div>
'Created and managed websites for clients to communicate securely'
<div><div style="font-size: 4">Created</div> and managed websites for clients to communicate securely</div>
'Created and managed websites for clients to communicate securely'
</code></pre>
<p>不幸的是,我不明白通常如何将输入行映射到输出HTML。你知道吗</p>