如何使用lxml.etree?

2024-10-02 18:19:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个类似下面输入的XML。在

这里我想聚集Diesel燃料类型的Sales值。在

如何迭代所有<Tank>元素并读取fuelItem属性,以找到相同燃料类型的多个实例,然后求出Sales属性值的总和?在

输入:

 <EnterpriseDocument>
      <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>

首选输出:

^{pr2}$

Tags: 实例元素类型属性xmlsales燃料总和
3条回答

由于您使用的是lxml,所以可以使用XSLT和Muenchian Grouping按其fuelItem属性对Tank元素进行分组。在

示例。。。在

XML输入(输入.xml)在

<EnterpriseDocument>
    <FuelTankList>
        <Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
    </FuelTankList>
</EnterpriseDocument>

XSLT 1.0(测试.xsl)在

^{pr2}$

Python

from lxml import etree

tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")

new_tree = tree.xslt(xslt)

print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))

输出(标准输出)

<EnterpriseDocument>
  <FuelTankList>
    <Tank fuelItem="Petrol" Sales="1000"/>
    <Tank fuelItem="Diesel" Sales="5000"/>
  </FuelTankList>
</EnterpriseDocument>

试试这个:

from lxml import etree

# Parse the input XML file.
tree = etree.parse(open("so-input.xml"))

# Collect Tank element attributes here.
tanks = {}

# The FuelTankList element whose children we will change.
fuel_tank_list = None

# Loop over all Tank elements, collect their values, remove them.
for tank in tree.xpath("//Tank"):
    # Get attributes.
    fuel_item = tank.get("fuelItem")
    sales = tank.get("Sales")

    # Add to sales sum.
    existing_sales = tanks.get(fuel_item, 0)
    tanks[fuel_item] = existing_sales + int(sales)

    # Remove <Tank>
    fuel_tank_list = tank.getparent()
    fuel_tank_list.remove(tank)

# Creat a new Tank element for each fuelItem value.
for fuel_item, sales in tanks.items():
    new_tank = etree.Element("Tank")
    new_tank.attrib["fuelItem"] = fuel_item
    new_tank.attrib["Sales"] = str(sales)
    fuel_tank_list.append(new_tank)

# Write the modified tree to a new file.
with open("so-output.xml", "wb") as f:
    f.write(etree.tostring(tree, pretty_print=True))

$ xmllint -format so-output.xml的输出:

^{pr2}$

希望这有帮助。它迭代每个fueltanklist,从中获取一个坦克列表,检索其值并删除它们。一旦我们有了这些值并对其进行了操作,我们会将带有过程值的新油箱添加到燃油箱列表中。在

import lxml.etree as le

xml = """<EnterpriseDocument><FuelTankList><Tank fuelItem="Petrol" Sales="1000" />
        <Tank  fuelItem="Diesel" Sales="2000" />
        <Tank  fuelItem="Diesel" Sales="3000" />
      </FuelTankList>
    </EnterpriseDocument>"""

root = le.fromstring(xml)

#get all the fueltanklists from the file

fueltanklist = root.xpath('//FuelTankList')
for fuellist in fueltanklist:
    tankdict={}
    #get all the tanks in the current fueltanklist

    tanks = fuellist.xpath('child::Tank')
    for tank in tanks:
        fuelitem = tank.attrib['fuelItem']
        sales = tank.attrib['Sales']
        if fuelitem in tankdict:
            tankdict[fuelitem] += int(sales)
        else:
            tankdict[fuelitem] = int(sales)

        #Once we have retrieved the value of the current tank, delete it from its parent

        tank.getparent().remove(tank)
    for key, value in tankdict.items():
        #Create and add tanks with new values to its parent
        newtank = le.Element("Tank", fuelItem=str(key), netSalesQty=str(value))
        fuellist.append(newtank)

#Store the entire xml in a new string

newxml = le.tostring(root)

相关问题 更多 >