用Python从xml文件中提取数据

2024-09-30 07:28:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从文件中提取一些数据:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
    <d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2/2_0" modelBaseVersion="2">
        <exchange>
            <supplierIdentification>
                <country>nl</country>
                <nationalIdentifier>NDW-CNS</nationalIdentifier>
            </supplierIdentification>
        </exchange>
        <payloadPublication xsi:type="MeasuredDataPublication" lang="nl">
            <publicationTime>2014-12-04T06:59:55.000Z</publicationTime>
            <publicationCreator>
                <country>nl</country>
                <nationalIdentifier>NDW-CNS</nationalIdentifier>
            </publicationCreator>
            <measurementSiteTableReference id="NDW01_MT" version="662" targetClass="MeasurementSiteTable"/>
            <headerInformation>
                <confidentiality>noRestriction</confidentiality>
                <informationStatus>real</informationStatus>
            </headerInformation>
            <siteMeasurements>
                <measurementSiteReference id="GEO03_D4T-RWS_T_0317_ID_324" version="3" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1">
                    <measuredValue>
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfInputValuesUsed="100" standardDeviation="7">
                                <duration>34</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
            <siteMeasurements>
                <measurementSiteReference id="GEO01_Z_RWSTRN054" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0" numberOfInputValuesUsed="7" standardDeviation="0.71" supplierCalculatedDataQuality="100.0">
                                <duration>56</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
           .
           .
           .
           .
           .
           <siteMeasurements>
                <measurementSiteReference id="RWS01_MONIBAS_0091hrr0350ra0" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0">
                                <duration>23</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
        </payloadPublication>
    </d2LogicalModel>
</soap:Body>

我要做的是使用Python从每个

^{pr2}$

“measurementSiteReference”中属性“id”的值和“duration”的文本内容

我用Python来做这个。目前我的代码:

import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='track.xml')
root = tree.getroot()

for elem in tree.iter():
   print elem.tag, elem.attrib

但我很难提取这些值。我对Python没有任何经验。在

我如何迭代“sitemessments”并获取measurementSiteTableReference的“id”属性值和“duration”的文本内容

请给我一些建议,以便在路上帮助我


Tags: idhttptypecountrysoapdurationxmlnsxsi
1条回答
网友
1楼 · 发布于 2024-09-30 07:28:23

您可能在xml文件底部缺少</soap:Envelope>标记,或者您可能没有粘贴副本。 总之,在放入标记并在顶部(第一行)添加以下xml标记之后,我就可以运行它了。在

<?xml version="1.0" encoding="UTF-8"?>

首先,我们需要弄清楚我们可以利用哪些要素。在

^{pr2}$

如下所示(截短)

<Element '{http://schemas.xmlsoap.org/soap/envelope/}Envelope' at 0x29e4170>
<Element '{http://schemas.xmlsoap.org/soap/envelope/}Body' at 0x29e4190>
|
|
<Element '{http://datex2.eu/schema/2/2_0}measurementSiteTableReference' at 0x29e4510>
|
|
<Element '{http://datex2.eu/schema/2/2_0}duration' at 0x29e4750>

一旦我们有了这些元素,我们只需对所需的元素进行迭代,就可以得到它们的键/值对。在

编码

import xml.etree.ElementTree as ET
data_file = 'soapData2.xml'
tree = ET.parse(data_file)
root = tree.getroot()


t1 = "{http://datex2.eu/schema/2/2_0}measurementSiteReference"
t2 = "{http://datex2.eu/schema/2/2_0}duration"

print "measurementSiteReference ", ": duration"
for e1, e2 in zip(root.iter(t1), root.iter(t2)):
   print e1.attrib['id'] , ":", e2.text

结果

>>> 
measurementSiteReference  : duration
GEO03_D4T-RWS_T_0317_ID_324 : 34
GEO01_Z_RWSTRN054 : 56
RWS01_MONIBAS_0091hrr0350ra0 : 23
>>> 

相关问题 更多 >

    热门问题