<p><a href="http://codespeak.net/lxml/" rel="nofollow">lxml</a>有助于完成这类任务。你知道吗</p>
<pre><code>html = """
<html>
<body>
<h1>MyWord</h1>
<a href="http://MyWord">MyWord</a>
<img src="images/MyWord.png"/>
<div class="MyWord">
<p>MyWord!</p>
MyWord
</div>
MyWord
</body><! MyWord >
</html>
"""
import lxml.etree as etree
tree = etree.fromstring(html)
for elem in tree.iter():
if elem.text:
elem.text = re.sub(r'MyWord', 'Myword', elem.text)
if elem.tail:
elem.tail = re.sub(r'MyWord', 'Myword', elem.tail)
print etree.tostring(tree)
</code></pre>
<p>上面打印的是:</p>
<pre><code><html>
<body>
<h1>Myword</h1>
<a href="http://MyWord">Myword</a>
<img src="images/MyWord.png"/>
<div class="MyWord">
<p>Myword!</p>
Myword
</div>
Myword
</body><! Myword >
</html>
</code></pre>
<p><strong>注意</strong>:如果还需要对脚本标记的内容进行特殊处理,则需要使上述代码稍微复杂一些,例如</p>
<pre><code><script>
var title = "MyWord"; // this should change to "Myword"
var hoverImage = "images/MyWord-hover.png"; // this should not change
</script>
</code></pre>