用Python从XML获取数据

2024-06-28 16:00:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图理解如何用Python从XML文件中提取某些数据。你知道吗

目前,我正在从API中提取信息并获取XML文件,但我希望直接从XML中获取特定信息。你知道吗

从我所能找到的似乎元素树是答案,但我发现它很难理解,真的不确定它是正确的方式来创建一个解决方案。你知道吗

我在下面留下了用于获取XML数据的代码,以及它提供给我的一个缩短的XML文件(只留下了需要提取的重要部分)。你知道吗

谢谢你。你知道吗

import requests


#Import routes
routes=[]



class routesClass:
    def __init__(self,name,url):#,start,end,offset,rwe,al):
        self.n=name
        self.u=url
        #self.s=start
        #self.e=end
        #self.o=offset
        #self.r=rwe
        #self.a=al

#Add example route
testRoute1=routesClass("EasternFwy-Hoddle/Johnston","https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.7999012967757,144.99318476311566:?routeType=shortest&key=SECRETKEY&computeTravelTimeFor=all")
routes.append(testRoute1)
#routes.append(testRoute2)

print(routes[0].u)

还有XML之类的东西。你知道吗

<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>764</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>67</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2017-12-28T14:42:14+11:00</departureTime>
<arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
</summary>

Tags: 文件数据self信息xmlsummaryroutesdeparturetime
1条回答
网友
1楼 · 发布于 2024-06-28 16:00:04

我推荐lxml。在我看来,浏览xml树比浏览元素树更容易。下面是如何使用模块的demo。你知道吗

一个例子 以您的xml为例,我将使用lxml解析它。如果你把代码保存为示例.xml以及xmlparse.py文件你知道吗

示例.xml-您提供的XML格式不正确。你知道吗

  • 它没有将两个摘要部分分组的父xml标记。你知道吗
  • 在两个摘要部分的中间有一个随机的<leg>标记。你知道吗

这两个问题不允许它进行解析,因此我删除了<leg>标记,并将两个摘要部分分组在一个<parent>标记中。这是XML。你知道吗

<parent>
    <summary>
        <lengthInMeters>5144</lengthInMeters>
        <travelTimeInSeconds>764</travelTimeInSeconds>
        <trafficDelayInSeconds>0</trafficDelayInSeconds>
        <departureTime>2017-12-28T14:42:14+11:00</departureTime>
        <arrivalTime>2017-12-28T14:54:58+11:00</arrivalTime>
        <noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
        <historicTrafficTravelTimeInSeconds>764</historicTrafficTravelTimeInSeconds>
        <liveTrafficIncidentsTravelTimeInSeconds>764</liveTrafficIncidentsTravelTimeInSeconds>
    </summary>
    <summary>
        <lengthInMeters>806</lengthInMeters>
        <travelTimeInSeconds>67</travelTimeInSeconds>
        <trafficDelayInSeconds>0</trafficDelayInSeconds>
        <departureTime>2017-12-28T14:42:14+11:00</departureTime>
        <arrivalTime>2017-12-28T14:43:21+11:00</arrivalTime>
        <noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
        <historicTrafficTravelTimeInSeconds>67</historicTrafficTravelTimeInSeconds>
        <liveTrafficIncidentsTravelTimeInSeconds>67</liveTrafficIncidentsTravelTimeInSeconds>
    </summary>
</parent>

xmlparse.py文件-在这个脚本中,我为您提供了一个打印密钥的循环(元素文本)和值(文本)以及一个逻辑语句,该语句检查其中一个键是否存在以及其值是否大于700,。这只是为了帮助您理解如何在循环中添加触发器。你知道吗

from lxml import etree

def parseXML(xmlFile):
    """
    Parse the xml
    """
    with open(xmlFile) as fobj:
        xml = fobj.read()

    root = etree.fromstring(xml)

    for appt in root.getchildren():
        for elem in appt.getchildren():
            if not elem.text:
                text = "None"
            else:
                text = elem.text

            ##This is doing something with the xml based on it's tag and value.
            if elem.tag == 'travelTimeInSeconds' and int(text) > 700:
                print('******** Do something with ', elem.tag, ' : ', text)
            print(elem.tag + " => " + text)

if __name__ == "__main__":
    parseXML("example.xml")

如果将代码保存为xmlparse.py文件并将更新后的xml保存在示例.xml文件运行脚本时将收到以下输出:

lengthInMeters => 5144
******** Do something with  travelTimeInSeconds  :  764
travelTimeInSeconds => 764
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:54:58+11:00
noTrafficTravelTimeInSeconds => 478
historicTrafficTravelTimeInSeconds => 764
liveTrafficIncidentsTravelTimeInSeconds => 764
lengthInMeters => 806
travelTimeInSeconds => 67
trafficDelayInSeconds => 0
departureTime => 2017-12-28T14:42:14+11:00
arrivalTime => 2017-12-28T14:43:21+11:00
noTrafficTravelTimeInSeconds => 59
historicTrafficTravelTimeInSeconds => 67
liveTrafficIncidentsTravelTimeInSeconds => 67

相关问题 更多 >