Python:从xmlfi中提取数据

2024-09-28 21:08:59 发布

您现在位置:Python中文网/ 问答频道 /正文

给定以下文件xml,其中存储车辆行程信息的值。如何将每个时间步上的累计行驶距离生成为.text文件。xml中没有特定的顺序,都是随机的。你知道吗

<tripinfos>
        <tripinfo id="1" depart="1.00" arrival="2" duration="1.00" distance="3"/>
        <tripinfo id="5" depart="2.00" arrival="4" duration="2.00" distance="5"/>
        <tripinfo id="10" depart="5.00" arrival="8" duration="3.00" distance="1"/>
        <tripinfo id="3" depart="3.00" arrival="6" duration="3.00" distance="2"/>
        <tripinfo id="8" depart="8.00" arrival="10" duration="2.00" distance="4"/>
</tripinfos>

你知道吗output.text文件你知道吗

0 //Time step #0
0
3
3
8
8
10
10
11
11
15

Tags: 文件text信息id距离时间xmldistance
3条回答

我确信有更好的解决方案使用xml库,但这里有一个简单的解决方案

import numpy as np

a = open('file.xml')
lines = a.readlines()

my_arr = np.zeros((len(lines)-2,2))
for i in range(len(lines[1:-1])):
    contents=lines[i+1].split('\"')
    my_arr[i,0]=(eval(contents[5]))
    my_arr[i,1]=(eval(contents[9]))

#Now sort according to arrival times
my_arr = (my_arr[my_arr[:,0].argsort()])
print(my_arr)

final_output=[]
cum_dist=0
last_index=0

for i in range(int(my_arr[-1,0])+1):
    if(i == my_arr[last_index,0]):
        cum_dist+=my_arr[last_index,1]
        last_index+=1
    final_output.append(int(cum_dist))

print(final_output)
np.savetxt('outputfile.txt',np.array(final_output), newline=',',fmt='%s')
a.close()

您的输出是

[[  2.   3.]
 [  4.   5.]
 [  6.   2.]
 [  8.   1.]
 [ 10.   4.]]
[0, 0, 3, 3, 8, 8, 10, 10, 11, 11, 15]

在这里,我为你的问题提供一个部分的解决方案。你知道吗

# import some packages
from numpy import array
import xml.etree.ElementTree as et

# init some lists
ids=[]
depart=[]
arrival=[]
duration=[]
distance=[]

# prep the xml document
xmltxt = """
    <root>
        <tripinfo id="1" depart="1.00" arrival="2" duration="1.00" distance="3"/>
        <tripinfo id="5" depart="2.00" arrival="4" duration="2.00" distance="5"/>
        <tripinfo id="3" depart="3.00" arrival="6" duration="3.00" distance="2"/>
        <tripinfo id="10" depart="5.00" arrival="8" duration="3.00" distance="1"/>
        <tripinfo id="8" depart="8.00" arrival="10" duration="2.00" distance="4"/>
    </root>
"""

# parse the xml text
xmldoc = et.fromstring(xmltxt)

# extract and output tripinfo attributes
# collect them into lists
for item in xmldoc.iterfind('tripinfo'):
    att=item.attrib
    ids.append(int(att['id']))
    depart.append(float(att['depart']))
    arrival.append(float(att['arrival']))
    duration.append(float(att['duration']))
    distance.append(float(att['distance']))

# put lists into an np.array
# and transpose it    
arr=array([ids, depart, arrival, duration, distance]).T

# sort array by 'depart' column. (index=1)
arr = arr[arr[:,1].argsort()]

sumdist=0
dept=0
print "depart: %s; Sum_dist= %s" % ( dept, sumdist )
for ea in arr:
    sumdist += ea[4] # distance
    dept = ea[1]  # depart
    # get 'arrival', 'duration' here, so that
    # you can use them to manipulate and get your exact solution
    print "depart: %s; Sum_dist= %s" % ( dept, sumdist )

输出为

depart: 0; Sum_dist= 0
depart: 1.0; Sum_dist= 3.0
depart: 2.0; Sum_dist= 8.0
depart: 3.0; Sum_dist= 10.0
depart: 5.0; Sum_dist= 11.0
depart: 8.0; Sum_dist= 15.0

最后我使用dict来存储每个到达时间,因为我知道最大到达时间,所以我可以用一个范围初始化这个键。你知道吗

import xml.etree.ElementTree as ET

filepath = r'tripinfo.xml'
tree = ET.parse(filepath)
root = tree.getroot()
mydict = {k:[] for k in range(7202)}

for trip in root.iter('tripinfo'):
    arrived = int(float(trip.get('arrival')))
    distance = float(trip.get('distance'))
    mydict[arrived].append(distance)

mysum = 0

outputfilepath = 'travelledDuration.txt'
outputfile = open(outputfilepath, 'a')
for i in range(7202):
    distanceList = mydict[i]
    mysum += sum(distanceList)
    outputfile.write(str(mysum)+"\n")
outputfile.close()

相关问题 更多 >