用python过滤xml

1条回答

网友

1楼 · 发布于 2024-05-08 07:28:04

它使用标准库中的xml.etree.ElementTree：

import xml.etree.ElementTree as xee
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc=xee.fromstring(data)

for tag in doc.findall('node2'):
    if tag.attrib['a1']=='x2':
        doc.remove(tag)
print(xee.tostring(doc))
# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

它使用lxml，它不在标准库中，但具有a more powerful syntax：

import lxml.etree
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc = lxml.etree.XML(data)
e=doc.find('node2/[@a1="x2"]')
doc.remove(e)
print(lxml.etree.tostring(doc))

# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

编辑：如果node2更深入地隐藏在xml中，则可以遍历所有标记，检查每个父标记，查看node2元素是否是其子元素之一，如果是，则删除它：

仅使用xml.etree.ElementTree：

doc=xee.fromstring(data)
for parent in doc.getiterator():
    for child in parent.findall('node2'):
        if child.attrib['a1']=='x2':
            parent.remove(child)

使用lxml：

doc = lxml.etree.XML(data)
for parent in doc.iter('*'):
    child=parent.find('node2/[@a1="x2"]')
    if child is not None:
        parent.remove(child)

相关问题更多 >

编程相关推荐

热门问题

热门文章

用python过滤xml

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >