检查并删除XML中重复的子标记

<main> <product> <article_nr>B00024J7C6</article_nr> <article_nr>44253</article_nr> <product_type>x</product_type> <product_type>x</product_type> </product> <product> <article_nr>B00024J7C7</article_nr> <product_type>y</product_type> </product> </main>

1条回答

网友

1楼 · 发布于 2024-10-03 13:30:13

看看Python remove duplicate elements from xml tree，也许它能帮你。像这样的事情：

import xml.etree.ElementTree as ET
path = 'in.xml'
tree = ET.parse(path)
root = tree.getroot()
prev = None

def elements_equal(e1, e2):
    if type(e1) != type(e2):
        return False
    if e1.tag != e1.tag: return False
    if e1.text != e2.text: return False
    if e1.tail != e2.tail: return False
    if e1.attrib != e2.attrib: return False
    if len(e1) != len(e2): return False
    return all([elements_equal(c1, c2) for c1, c2 in zip(e1, e2)])

for page in root:                     # iterate over pages
    elems_to_remove = []
    for elem in page:
        if elements_equal(elem, prev):
            print("found duplicate: %s" % elem.text)   # equal function works well
            elems_to_remove.append(elem)
            continue
        prev = elem
    for elem_to_remove in elems_to_remove:
        page.remove(elem_to_remove)
tree.write("out.xml")

相关问题更多 >

编程相关推荐

热门问题

热门文章

检查并删除XML中重复的子标记

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >