<p>我终于找到了解决问题的办法。你知道吗</p>
<pre><code>import lxml.etree as ET
my_xml = """...xml content..."""
data = ET.XML(my_xml.encode('UTF-8'))
#this loop remove "<Unicode />" tags.
for target in data.xpath("//*[local-name() = 'Unicode'][not(text())]"):
target.getparent().remove(target)
#and this loop remove nodes without children like "<TextEquiv><Unicode /></TextEquiv>"
#(after the removing of "<Unicode />")
for el in data.iter():
if len(list(el.iterchildren())) or ''.join([_.strip() for _ in el.itertext()]):
pass
else:
parent = el.getparent()
if parent is not None:
parent.remove(el)
#and this loop remove nodes without children again, but now - it's "<TextLine>" tag
for el in data.iter():
if len(list(el.iterchildren())) or ''.join([_.strip() for _ in el.itertext()]):
pass
else:
parent = el.getparent()
if parent is not None:
parent.remove(el)
print(ET.tostring(data, xml_declaration=True))
</code></pre>
<p>这个想法来自<a href="https://stackoverflow.com/questions/46299410/remove-xml-nodes-without-child-nodes-using-python]">Remove xml nodes without child nodes using python</a></p>