XML解析到.txt文件Python

2024-05-20 00:05:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要解析这个XML Document,并将日期和时间移到%Y-%m-%d %H:%M:%S格式中,并将变量hourly-qpf和{}移动到制表符分隔的.txt文件中的列中。在

我所做的就是使用以下代码读入XML文件:

page = urllib2.urlopen('http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML')
page_content = page.read()
with open('KBFI.xml', 'w') as fid:
    fid.write(page_content)

在这之后我不知所措。我以前只分析过一个XML文档,它看起来与此完全不同。在

编辑

很抱歉之前没有给你们任何东西,但我不知道该用什么模块,因为我只有minidom的经验,它似乎不是正确的选择。我一直在搞元素树,我想出了这个:

^{pr2}$

但是,有一个问题,因为变量是连字符的,我不知道如何将它们改为下划线或删除它们。而且,正因为如此,我不知道我的代码是否有用!在


Tags: 文件代码txt格式时间pagexmlcontent
2条回答

我建议使用xmltoict来解析和提取XML中的数据,因为它可以将XML转换为Python dict,并使用与XML源相同的嵌套方式,因此它简单易用。对于那些熟悉Python语法的人来说,使用Python语法是很自然的,Python dict是完全通用的,这意味着它们能够表达异构和嵌套的数据结构。例如,Pickling Tools Library依赖于Python DIts,用于Python、C++和java数据互操作性,并提供将XML转换为DIX的工具。XMLtoDIT的优点在于它的小型、快速和独立的模块,只需将XML转换为DI.T/P>

作为xmltodict用法的一个示例,下面的脚本下载this XML document,并提取其创建日期以及降水概率和每小时qpf值的列表:

import requests
url='http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML'
r = requests.get(url)

import xmltodict
result = xmltodict.parse(r.text)  
cd = result['dwml']['head']['product']['creation-date']['#text']
print("creation-date =",cd)
pop = result['dwml']['data']['parameters']['probability-of-precipitation']['value']
print("\nprobability-of-precipitation =", pop)
hqpf = result['dwml']['data']['parameters']['hourly-qpf']['value']
print("\nhourly-qpf =", hqpf)

以下是运行此脚本的输出(在20150730上):

^{pr2}$

xmltodict可以与“pip install xmltoict”一起安装。它是由martinblech开发的,它的GitHub项目位于https://github.com/martinblech/xmltodict。在

为了访问起始有效时间和结束有效时间,了解它们的数据结构以及它们的位置是很有帮助的。由于这两个值都是包含在相同标签中的一系列值,直观地说,每个系列都应作为键的值形成一个单独的列表,其名称类似于降水概率和每小时qpf。这可以通过打印整个结果dict和检查其中的开始有效时间和结束有效时间的格式来确认,并且可以通过漂亮地打印结果dict(使用import pprint然后运行)来实现pprint.pprint(结果)。对于this XML document,漂亮地打印它的等价dict将生成2000多行,但是start valid time从第26行开始,其值显然是一个列表:

^{3}$

下面是一个脚本,它将创建日期提取并打印为标量值、列表中的所有开始有效时间值、列表中的所有结束有效时间值、列表中的所有降水概率值以及列表中的所有小时qpf值,并打印每个提取列表的长度:

import xmltodict
result = xmltodict.parse(r.text)

cd = result['dwml']['head']['product']['creation-date']['#text']
print("creation-date =",cd)

svt = result['dwml']['data']['time-layout']['start-valid-time']
print("\nstart-valid-time =", svt)
print("number of start-valid-time entries =", len(svt))

evt = result['dwml']['data']['time-layout']['end-valid-time']
print("\nend-valid-time =", evt)
print("number of end-valid-time entries =", len(evt))

pop = result['dwml']['data']['parameters']['probability-of-precipitation']['value']
print("\nprobability-of-precipitation =", pop)
print("number of probability-of-precipitation entries =", len(pop))

hqpf = result['dwml']['data']['parameters']['hourly-qpf']['value']
print("\nhourly-qpf =", hqpf)
print("number of hourly-qpf entries =", len(hqpf))

以下是运行此脚本的输出(在20150731上):

creation-date = 2015-07-31T14:20:30-07:00

start-valid-time = ['2015-07-31T16:00:00-07:00', '2015-07-31T17:00:00-07:00', '2015-07-31T18:00:00-07:00', '2015-07-31T19:00:00-07:00', '2015-07-31T20:00:00-07:00', '2015-07-31T21:00:00-07:00', '2015-07-31T22:00:00-07:00', '2015-07-31T23:00:00-07:00', '2015-08-01T00:00:00-07:00', '2015-08-01T01:00:00-07:00', '2015-08-01T02:00:00-07:00', '2015-08-01T03:00:00-07:00', '2015-08-01T04:00:00-07:00', '2015-08-01T05:00:00-07:00', '2015-08-01T06:00:00-07:00', '2015-08-01T07:00:00-07:00', '2015-08-01T08:00:00-07:00', '2015-08-01T09:00:00-07:00', '2015-08-01T10:00:00-07:00', '2015-08-01T11:00:00-07:00', '2015-08-01T12:00:00-07:00', '2015-08-01T13:00:00-07:00', '2015-08-01T14:00:00-07:00', '2015-08-01T15:00:00-07:00', '2015-08-01T16:00:00-07:00', '2015-08-01T17:00:00-07:00', '2015-08-01T18:00:00-07:00', '2015-08-01T19:00:00-07:00', '2015-08-01T20:00:00-07:00', '2015-08-01T21:00:00-07:00', '2015-08-01T22:00:00-07:00', '2015-08-01T23:00:00-07:00', '2015-08-02T00:00:00-07:00', '2015-08-02T01:00:00-07:00', '2015-08-02T02:00:00-07:00', '2015-08-02T03:00:00-07:00', '2015-08-02T04:00:00-07:00', '2015-08-02T05:00:00-07:00', '2015-08-02T06:00:00-07:00', '2015-08-02T07:00:00-07:00', '2015-08-02T08:00:00-07:00', '2015-08-02T09:00:00-07:00', '2015-08-02T10:00:00-07:00', '2015-08-02T11:00:00-07:00', '2015-08-02T12:00:00-07:00', '2015-08-02T13:00:00-07:00', '2015-08-02T14:00:00-07:00', '2015-08-02T15:00:00-07:00', '2015-08-02T16:00:00-07:00', '2015-08-02T17:00:00-07:00', '2015-08-02T18:00:00-07:00', '2015-08-02T19:00:00-07:00', '2015-08-02T20:00:00-07:00', '2015-08-02T21:00:00-07:00', '2015-08-02T22:00:00-07:00', '2015-08-02T23:00:00-07:00', '2015-08-03T00:00:00-07:00', '2015-08-03T01:00:00-07:00', '2015-08-03T02:00:00-07:00', '2015-08-03T03:00:00-07:00', '2015-08-03T04:00:00-07:00', '2015-08-03T05:00:00-07:00', '2015-08-03T06:00:00-07:00', '2015-08-03T07:00:00-07:00', '2015-08-03T08:00:00-07:00', '2015-08-03T09:00:00-07:00', '2015-08-03T10:00:00-07:00', '2015-08-03T11:00:00-07:00', '2015-08-03T12:00:00-07:00', '2015-08-03T13:00:00-07:00', '2015-08-03T14:00:00-07:00', '2015-08-03T15:00:00-07:00', '2015-08-03T16:00:00-07:00', '2015-08-03T17:00:00-07:00', '2015-08-03T18:00:00-07:00', '2015-08-03T19:00:00-07:00', '2015-08-03T20:00:00-07:00', '2015-08-03T21:00:00-07:00', '2015-08-03T22:00:00-07:00', '2015-08-03T23:00:00-07:00', '2015-08-04T00:00:00-07:00', '2015-08-04T01:00:00-07:00', '2015-08-04T02:00:00-07:00', '2015-08-04T03:00:00-07:00', '2015-08-04T04:00:00-07:00', '2015-08-04T05:00:00-07:00', '2015-08-04T06:00:00-07:00', '2015-08-04T07:00:00-07:00', '2015-08-04T08:00:00-07:00', '2015-08-04T09:00:00-07:00', '2015-08-04T10:00:00-07:00', '2015-08-04T11:00:00-07:00', '2015-08-04T12:00:00-07:00', '2015-08-04T13:00:00-07:00', '2015-08-04T14:00:00-07:00', '2015-08-04T15:00:00-07:00', '2015-08-04T16:00:00-07:00', '2015-08-04T17:00:00-07:00', '2015-08-04T18:00:00-07:00', '2015-08-04T19:00:00-07:00', '2015-08-04T20:00:00-07:00', '2015-08-04T21:00:00-07:00', '2015-08-04T22:00:00-07:00', '2015-08-04T23:00:00-07:00', '2015-08-05T00:00:00-07:00', '2015-08-05T01:00:00-07:00', '2015-08-05T02:00:00-07:00', '2015-08-05T03:00:00-07:00', '2015-08-05T04:00:00-07:00', '2015-08-05T05:00:00-07:00', '2015-08-05T06:00:00-07:00', '2015-08-05T07:00:00-07:00', '2015-08-05T08:00:00-07:00', '2015-08-05T09:00:00-07:00', '2015-08-05T10:00:00-07:00', '2015-08-05T11:00:00-07:00', '2015-08-05T12:00:00-07:00', '2015-08-05T13:00:00-07:00', '2015-08-05T14:00:00-07:00', '2015-08-05T15:00:00-07:00', '2015-08-05T16:00:00-07:00', '2015-08-05T17:00:00-07:00', '2015-08-05T18:00:00-07:00', '2015-08-05T19:00:00-07:00', '2015-08-05T20:00:00-07:00', '2015-08-05T21:00:00-07:00', '2015-08-05T22:00:00-07:00', '2015-08-05T23:00:00-07:00', '2015-08-06T00:00:00-07:00', '2015-08-06T01:00:00-07:00', '2015-08-06T02:00:00-07:00', '2015-08-06T03:00:00-07:00', '2015-08-06T04:00:00-07:00', '2015-08-06T05:00:00-07:00', '2015-08-06T06:00:00-07:00', '2015-08-06T07:00:00-07:00', '2015-08-06T08:00:00-07:00', '2015-08-06T09:00:00-07:00', '2015-08-06T10:00:00-07:00', '2015-08-06T11:00:00-07:00', '2015-08-06T12:00:00-07:00', '2015-08-06T13:00:00-07:00', '2015-08-06T14:00:00-07:00', '2015-08-06T15:00:00-07:00', '2015-08-06T16:00:00-07:00', '2015-08-06T17:00:00-07:00', '2015-08-06T18:00:00-07:00', '2015-08-06T19:00:00-07:00', '2015-08-06T20:00:00-07:00', '2015-08-06T21:00:00-07:00', '2015-08-06T22:00:00-07:00', '2015-08-06T23:00:00-07:00', '2015-08-07T00:00:00-07:00', '2015-08-07T01:00:00-07:00', '2015-08-07T02:00:00-07:00', '2015-08-07T03:00:00-07:00', '2015-08-07T04:00:00-07:00', '2015-08-07T05:00:00-07:00', '2015-08-07T06:00:00-07:00', '2015-08-07T07:00:00-07:00', '2015-08-07T08:00:00-07:00', '2015-08-07T09:00:00-07:00', '2015-08-07T10:00:00-07:00', '2015-08-07T11:00:00-07:00', '2015-08-07T12:00:00-07:00', '2015-08-07T13:00:00-07:00', '2015-08-07T14:00:00-07:00', '2015-08-07T15:00:00-07:00']
number of start-valid-time entries = 168

end-valid-time = ['2015-07-31T17:00:00-07:00', '2015-07-31T18:00:00-07:00', '2015-07-31T19:00:00-07:00', '2015-07-31T20:00:00-07:00', '2015-07-31T21:00:00-07:00', '2015-07-31T22:00:00-07:00', '2015-07-31T23:00:00-07:00', '2015-08-01T00:00:00-07:00', '2015-08-01T01:00:00-07:00', '2015-08-01T02:00:00-07:00', '2015-08-01T03:00:00-07:00', '2015-08-01T04:00:00-07:00', '2015-08-01T05:00:00-07:00', '2015-08-01T06:00:00-07:00', '2015-08-01T07:00:00-07:00', '2015-08-01T08:00:00-07:00', '2015-08-01T09:00:00-07:00', '2015-08-01T10:00:00-07:00', '2015-08-01T11:00:00-07:00', '2015-08-01T12:00:00-07:00', '2015-08-01T13:00:00-07:00', '2015-08-01T14:00:00-07:00', '2015-08-01T15:00:00-07:00', '2015-08-01T16:00:00-07:00', '2015-08-01T17:00:00-07:00', '2015-08-01T18:00:00-07:00', '2015-08-01T19:00:00-07:00', '2015-08-01T20:00:00-07:00', '2015-08-01T21:00:00-07:00', '2015-08-01T22:00:00-07:00', '2015-08-01T23:00:00-07:00', '2015-08-02T00:00:00-07:00', '2015-08-02T01:00:00-07:00', '2015-08-02T02:00:00-07:00', '2015-08-02T03:00:00-07:00', '2015-08-02T04:00:00-07:00', '2015-08-02T05:00:00-07:00', '2015-08-02T06:00:00-07:00', '2015-08-02T07:00:00-07:00', '2015-08-02T08:00:00-07:00', '2015-08-02T09:00:00-07:00', '2015-08-02T10:00:00-07:00', '2015-08-02T11:00:00-07:00', '2015-08-02T12:00:00-07:00', '2015-08-02T13:00:00-07:00', '2015-08-02T14:00:00-07:00', '2015-08-02T15:00:00-07:00', '2015-08-02T16:00:00-07:00', '2015-08-02T17:00:00-07:00', '2015-08-02T18:00:00-07:00', '2015-08-02T19:00:00-07:00', '2015-08-02T20:00:00-07:00', '2015-08-02T21:00:00-07:00', '2015-08-02T22:00:00-07:00', '2015-08-02T23:00:00-07:00', '2015-08-03T00:00:00-07:00', '2015-08-03T01:00:00-07:00', '2015-08-03T02:00:00-07:00', '2015-08-03T03:00:00-07:00', '2015-08-03T04:00:00-07:00', '2015-08-03T05:00:00-07:00', '2015-08-03T06:00:00-07:00', '2015-08-03T07:00:00-07:00', '2015-08-03T08:00:00-07:00', '2015-08-03T09:00:00-07:00', '2015-08-03T10:00:00-07:00', '2015-08-03T11:00:00-07:00', '2015-08-03T12:00:00-07:00', '2015-08-03T13:00:00-07:00', '2015-08-03T14:00:00-07:00', '2015-08-03T15:00:00-07:00', '2015-08-03T16:00:00-07:00', '2015-08-03T17:00:00-07:00', '2015-08-03T18:00:00-07:00', '2015-08-03T19:00:00-07:00', '2015-08-03T20:00:00-07:00', '2015-08-03T21:00:00-07:00', '2015-08-03T22:00:00-07:00', '2015-08-03T23:00:00-07:00', '2015-08-04T00:00:00-07:00', '2015-08-04T01:00:00-07:00', '2015-08-04T02:00:00-07:00', '2015-08-04T03:00:00-07:00', '2015-08-04T04:00:00-07:00', '2015-08-04T05:00:00-07:00', '2015-08-04T06:00:00-07:00', '2015-08-04T07:00:00-07:00', '2015-08-04T08:00:00-07:00', '2015-08-04T09:00:00-07:00', '2015-08-04T10:00:00-07:00', '2015-08-04T11:00:00-07:00', '2015-08-04T12:00:00-07:00', '2015-08-04T13:00:00-07:00', '2015-08-04T14:00:00-07:00', '2015-08-04T15:00:00-07:00', '2015-08-04T16:00:00-07:00', '2015-08-04T17:00:00-07:00', '2015-08-04T18:00:00-07:00', '2015-08-04T19:00:00-07:00', '2015-08-04T20:00:00-07:00', '2015-08-04T21:00:00-07:00', '2015-08-04T22:00:00-07:00', '2015-08-04T23:00:00-07:00', '2015-08-05T00:00:00-07:00', '2015-08-05T01:00:00-07:00', '2015-08-05T02:00:00-07:00', '2015-08-05T03:00:00-07:00', '2015-08-05T04:00:00-07:00', '2015-08-05T05:00:00-07:00', '2015-08-05T06:00:00-07:00', '2015-08-05T07:00:00-07:00', '2015-08-05T08:00:00-07:00', '2015-08-05T09:00:00-07:00', '2015-08-05T10:00:00-07:00', '2015-08-05T11:00:00-07:00', '2015-08-05T12:00:00-07:00', '2015-08-05T13:00:00-07:00', '2015-08-05T14:00:00-07:00', '2015-08-05T15:00:00-07:00', '2015-08-05T16:00:00-07:00', '2015-08-05T17:00:00-07:00', '2015-08-05T18:00:00-07:00', '2015-08-05T19:00:00-07:00', '2015-08-05T20:00:00-07:00', '2015-08-05T21:00:00-07:00', '2015-08-05T22:00:00-07:00', '2015-08-05T23:00:00-07:00', '2015-08-06T00:00:00-07:00', '2015-08-06T01:00:00-07:00', '2015-08-06T02:00:00-07:00', '2015-08-06T03:00:00-07:00', '2015-08-06T04:00:00-07:00', '2015-08-06T05:00:00-07:00', '2015-08-06T06:00:00-07:00', '2015-08-06T07:00:00-07:00', '2015-08-06T08:00:00-07:00', '2015-08-06T09:00:00-07:00', '2015-08-06T10:00:00-07:00', '2015-08-06T11:00:00-07:00', '2015-08-06T12:00:00-07:00', '2015-08-06T13:00:00-07:00', '2015-08-06T14:00:00-07:00', '2015-08-06T15:00:00-07:00', '2015-08-06T16:00:00-07:00', '2015-08-06T17:00:00-07:00', '2015-08-06T18:00:00-07:00', '2015-08-06T19:00:00-07:00', '2015-08-06T20:00:00-07:00', '2015-08-06T21:00:00-07:00', '2015-08-06T22:00:00-07:00', '2015-08-06T23:00:00-07:00', '2015-08-07T00:00:00-07:00', '2015-08-07T01:00:00-07:00', '2015-08-07T02:00:00-07:00', '2015-08-07T03:00:00-07:00', '2015-08-07T04:00:00-07:00', '2015-08-07T05:00:00-07:00', '2015-08-07T06:00:00-07:00', '2015-08-07T07:00:00-07:00', '2015-08-07T08:00:00-07:00', '2015-08-07T09:00:00-07:00', '2015-08-07T10:00:00-07:00', '2015-08-07T11:00:00-07:00', '2015-08-07T12:00:00-07:00', '2015-08-07T13:00:00-07:00', '2015-08-07T14:00:00-07:00', '2015-08-07T15:00:00-07:00', '2015-08-07T16:00:00-07:00']
number of end-valid-time entries = 168

probability-of-precipitation = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '34', '34', '34', '34', '34', '34', '34', '34', '34', '34', '34']
number of probability-of-precipitation entries = 168

hourly-qpf = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0.0050', '0.0050', '0.0050', '0.0050', '0.0050', '0.0050', '0.0033', '0.0033', '0.0033', '0.0033', '0.0033', '0.0033', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0.0017', '0.0017', '0.0017', '0.0017', '0.0017', '0.0017', '0.0083', '0.0083', '0.0083', '0.0083', '0.0083']
number of hourly-qpf entries = 168

可以使用python xml库:

https://docs.python.org/2/library/xml.etree.elementtree.html

import urllib2
import xml.etree.ElementTree as ET
page = urllib2.urlopen('http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML')
page_content = page.read()
root = ET.fromstring(page_content)
for _f in root.itertext():
    ***Do your formatting here***

相关问题 更多 >