从XML提取的Python保持相同的模式

<?xml version="1.0" encoding="utf-8" ?> <ROOT> <facturic id_user="18446195"><artfacturic/></facturic> <facturic id_user="18446195"><artfacturic/></facturic> <facturic id_user="34259554"><artfacturic/></facturic> </ROOT>

2条回答

网友

1楼 · 编辑于 2024-09-28 19:05:29

我会这样做：使用^{}来收集每个id_user值的节点。然后，对生成的字典进行后期处理，将副本写入单独的文件。使用lxml.etree：

from collections import defaultdict
from lxml import etree

tree = etree.parse("input.xml")

facturics = defaultdict(list)

for node in tree.xpath(".//facturic"):
    facturics[node.attrib["id_user"]].append(node)

for user_id, nodes in facturics.items():
    if len(nodes) > 1:  # save duplicates
        with open("{user_id}.xml".format(user_id=user_id), "w") as output_file:
            root = etree.Element("ROOT")
            for node in nodes:
                root.append(node)
            etree.ElementTree(root).write(output_file, pretty_print=True)

运行此代码后，将在当前目录中生成一个名为18446195.xml的新文件，其中包含以下内容：

^{pr2}$

网友

2楼 · 编辑于 2024-09-28 19:05:29

考虑一下XSLT，这是一种专门用来转换XML的语言，例如保留具有重复属性的节点。Python的第三方模块lxml，可以运行xslt1.0脚本。另一个好处是XSLT可以移植到其他语言/软件中，而且不需要Python来运行它！在

具体来说，下面使用Muenchenian Grouping为每个不同的@id_user使用xsl:key索引文档。然后模板匹配只检索计数大于1的那些。在

XSLT（另存为.xsl文件，一个特殊的.xml文件）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="idkey" match="facturic" use="@id_user" />

  <xsl:template match="/ROOT">
    <xsl:copy>
      <xsl:apply-templates select="facturic"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="facturic[count(key('idkey', @id_user)) > 1]">
    <xsl:copy>
        <xsl:copy-of select="*|@*"/>    
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python（无for循环或if逻辑）

^{pr2}$

输出

<?xml version="1.0"?>
<ROOT>
  <facturic id_user="18446195">
    <artfacturic/>
  </facturic>
  <facturic id_user="18446195">
    <artfacturic/>
  </facturic>
</ROOT>

相关问题更多 >

编程相关推荐

热门问题

热门文章