<p>如果您的代码花费的时间比预期的要长,您总是可以从一些print语句开始,以便更好地了解所花费的时间</p>
<p>对于您的任务,一个循环就足够了。迭代xml文件中的所有“段”元素。当段的名称包含在del_names.txt文件中时,将其删除</p>
<p>为了更快地查找名称,我将名称列表转换为<code>set</code></p>
<pre class="lang-py prettyprint-override"><code>from lxml import etree
with open("g.xml", "r") as xml_file:
xml_data = xml_file.read()
print("read xml data")
with open('del_names.txt', 'r') as file:
names_to_delete = set(file.read().split("\n"))
print("read text data")
new_xml = xml_data
tree = etree.XML(new_xml.encode())
for segment in tree.xpath("*//segment"):
name = segment.attrib.get("name")
if name in names_to_delete:
print(f"will delete segment '{name}'")
segment.getparent().remove(segment)
print(" result ".center(80, "="))
new_xml = str(etree.tostring(tree, encoding="unicode", pretty_print=True))
print(new_xml)
</code></pre>
<p><strong>输出:</strong></p>
<pre><code>read xml data
read text data
will delete segment '1'
will delete segment '3'
==================================== result ====================================
<?xml version='1.0' encoding='ASCII'?>
<corpus name="corpus">
<recording audio="audio.wav" name="first audio">
<segment name="2" start="2" end="4">
<orth>some text 2</orth>
</segment>
</recording>
</corpus>
</code></pre>