Python:使用不同的嵌套元素从xml创建csv

2024-10-02 06:35:29 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的xml文件:

    <?xml version="1.0" encoding="UTF-8"?>
    <Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ccts="urn:oasis:names:specification:ubl:schema:xsd:CoreComponentParameters-2" xmlns:sdt="urn:oasis:names:specification:ubl:schema:xsd:SpecializedDatatypes-2" xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2 UBL-Invoice-2.0.xsd">
       <cbc:ID>102165444</cbc:ID>
       <cac:InvoiceLine>
          <cbc:ID>1.0000</cbc:ID>
          <cbc:Note />
          <cbc:InvoicedQuantity unitCode="CT">1.0000</cbc:InvoicedQuantity>
          <cbc:LineExtensionAmount currencyID="DKK">142.3900</cbc:LineExtensionAmount>
          <cac:TaxTotal>
             <cbc:TaxAmount currencyID="DKK">138.24</cbc:TaxAmount>
             <cac:TaxSubtotal>
                <cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
                <cbc:TaxAmount currencyID="DKK">7.20</cbc:TaxAmount>
                <cac:TaxCategory>
                   <cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">3645</cbc:ID>
                   <cac:TaxScheme>
                      <cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">140</cbc:ID>
                      <cbc:Name>Afgift</cbc:Name>
                      <cbc:TaxTypeCode listAgencyID="320" listID="urn:oioubl:codelist:taxtypecode-1.1">StandardRated</cbc:TaxTypeCode>
                   </cac:TaxScheme>
                </cac:TaxCategory>
             </cac:TaxSubtotal>
          </cac:TaxTotal>
       </cac:InvoiceLine>
          <cbc:ID>2.0000</cbc:ID>
          <cbc:Note />
          <cbc:InvoicedQuantity unitCode="CT">1.0000</cbc:InvoicedQuantity>
          <cbc:LineExtensionAmount currencyID="DKK">142.3900</cbc:LineExtensionAmount>
          <cac:TaxTotal>
             <cbc:TaxAmount currencyID="DKK">138.24</cbc:TaxAmount>
             <cac:TaxSubtotal>
                <cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
                <cbc:TaxAmount currencyID="DKK">7.20</cbc:TaxAmount>
                <cac:TaxCategory>
                   <cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">3645</cbc:ID>
                   <cac:TaxScheme>
                      <cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">140</cbc:ID>
                      <cbc:Name>Afgift</cbc:Name>
                      <cbc:TaxTypeCode listAgencyID="320" listID="urn:oioubl:codelist:taxtypecode-1.1">StandardRated</cbc:TaxTypeCode>
                   </cac:TaxScheme>
                </cac:TaxCategory>
             </cac:TaxSubtotal>
          </cac:TaxTotal>
          <cac:TaxTotal>
             <cbc:TaxAmount currencyID="DKK">35.60</cbc:TaxAmount>
             <cac:TaxSubtotal>
                <cbc:TaxableAmount currencyID="DKK">142.39</cbc:TaxableAmount>
                <cbc:TaxAmount currencyID="DKK">35.60</cbc:TaxAmount>
                <cac:TaxCategory>
                   <cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxcategoryid-1.1">StandardRated</cbc:ID>
                   <cbc:Percent>25</cbc:Percent>
                   <cac:TaxScheme>
                      <cbc:ID schemeAgencyID="320" schemeID="urn:oioubl:id:taxschemeid-1.1">63</cbc:ID>
                      <cbc:Name>Moms</cbc:Name>
                   </cac:TaxScheme>
                </cac:TaxCategory>
             </cac:TaxSubtotal>
          </cac:TaxTotal>
       </cac:InvoiceLine>
    </Invoice>

如您所见,该文件有一个id,几个“Invoice line”,每个行都有自己的id和其他子元素。在

我要做的是创建一个csv文件,每个发票行都有一行,其中包含来自特定嵌套元素的信息。挑战在于,对于每一行,可以有几个“TaxTotal”子元素。如果是这样的话,我需要另一行这样的信息:

^{pr2}$

我该怎么做?在


Tags: idinvoicespecificationxsdcbccacoasisxmlns
1条回答
网友
1楼 · 发布于 2024-10-02 06:35:29

由于总是至少有一个TaxTotal元素,所以我将为每个元素创建一个新的csv行,并返回到树的前面的值。在

下面是一个使用lxml的示例。我添加了一个函数,以便更容易地处理空值,但是对于值的任何其他格式,我将留给您处理。在

Python3.6

from lxml import etree
import csv


def get_value(target_tree, xpath, namespaces):
    try:
        return target_tree.xpath(xpath, namespaces=namespaces)[0].text
    except IndexError:
        return ""


tree = etree.parse("input.xml")

ns = {"cac": "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2",
      "cbc": "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2",
      "i2": "urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"}

with open("output.csv", "w") as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=";", lineterminator="\n", quoting=csv.QUOTE_MINIMAL)
    # Header
    csvwriter.writerow(["ID", "/InvoiceLine/ID", "/InvoiceLine/InvoicedQuantity", "/InvoiceLine/LineExtensionAmount",
                        "/InvoiceLine/TaxTotal/TaxAmount", "/InvoiceLine/TaxTotal/TaxSubtotal/TaxableAmount",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxAmount",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/ID",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/Percent",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/ID",
                        "/InvoiceLine/TaxTotal/TaxSubtotal/TaxCategory/TaxScheme/Name"])
    for tax_total in tree.xpath("//cac:TaxTotal", namespaces=ns):
        csvwriter.writerow([get_value(tax_total, "/i2:Invoice/cbc:ID", ns),
                            get_value(tax_total, "../cbc:ID", ns),
                            get_value(tax_total, "../cbc:InvoicedQuantity", ns),
                            get_value(tax_total, "../cbc:LineExtensionAmount", ns),
                            get_value(tax_total, "cbc:TaxAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cbc:TaxableAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cbc:TaxAmount", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:ID", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cbc:Percent", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:ID", ns),
                            get_value(tax_total, "cac:TaxSubtotal/cac:TaxCategory/cac:TaxScheme/cbc:Name", ns)])

输出(输出.csv)在

^{pr2}$

相关问题 更多 >

    热门问题