<p>尽管ElementTree对于大多数XML处理任务来说非常容易使用,但是对于混合内容也不方便。我建议使用DOM解析器:</p>
<pre><code>from xml.dom import minidom
import re
ws_split = re.compile(r'\s+', re.U).split
def processNode(parent):
doc = parent.ownerDocument
for node in parent.childNodes[:]:
if node.nodeType==node.TEXT_NODE:
words = ws_split(node.nodeValue)
new_words = []
changed = False
for word in words:
if word in glossary:
text = ' '.join(new_words+[''])
parent.insertBefore(doc.createTextNode(text), node)
b = doc.createElement('b')
b.appendChild(doc.createTextNode(word))
parent.insertBefore(b, node)
new_words = ['']
changed = True
else:
new_words.append(word)
if changed:
text = ' '.join(new_words)
print text
parent.replaceChild(doc.createTextNode(text), node)
else:
processNode(node)
</code></pre>
<p>我还使用regexp来拆分单词,以避免它们粘在一起:</p>
^{pr2}$