XML解析到.txt文件Python

2条回答

网友

1楼 · 编辑于 2024-05-20 00:05:27

我建议使用xmltoict来解析和提取XML中的数据，因为它可以将XML转换为Python dict，并使用与XML源相同的嵌套方式，因此它简单易用。对于那些熟悉Python语法的人来说，使用Python语法是很自然的，Python dict是完全通用的，这意味着它们能够表达异构和嵌套的数据结构。例如，Pickling Tools Library依赖于Python DIts，用于Python、C++和java数据互操作性，并提供将XML转换为DIX的工具。XMLtoDIT的优点在于它的小型、快速和独立的模块，只需将XML转换为DI.T/P>

作为xmltodict用法的一个示例，下面的脚本下载this XML document，并提取其创建日期以及降水概率和每小时qpf值的列表：

import requests
url='http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML'
r = requests.get(url)

import xmltodict
result = xmltodict.parse(r.text)  
cd = result['dwml']['head']['product']['creation-date']['#text']
print("creation-date =",cd)
pop = result['dwml']['data']['parameters']['probability-of-precipitation']['value']
print("\nprobability-of-precipitation =", pop)
hqpf = result['dwml']['data']['parameters']['hourly-qpf']['value']
print("\nhourly-qpf =", hqpf)

以下是运行此脚本的输出（在20150730上）：

^{pr2}$

xmltodict可以与“pip install xmltoict”一起安装。它是由martinblech开发的，它的GitHub项目位于https://github.com/martinblech/xmltodict。在

为了访问起始有效时间和结束有效时间，了解它们的数据结构以及它们的位置是很有帮助的。由于这两个值都是包含在相同标签中的一系列值，直观地说，每个系列都应作为键的值形成一个单独的列表，其名称类似于降水概率和每小时qpf。这可以通过打印整个结果dict和检查其中的开始有效时间和结束有效时间的格式来确认，并且可以通过漂亮地打印结果dict（使用import pprint然后运行）来实现pprint.pprint（结果）。对于this XML document，漂亮地打印它的等价dict将生成2000多行，但是start valid time从第26行开始，其值显然是一个列表：

^{3}$

下面是一个脚本，它将创建日期提取并打印为标量值、列表中的所有开始有效时间值、列表中的所有结束有效时间值、列表中的所有降水概率值以及列表中的所有小时qpf值，并打印每个提取列表的长度：

import xmltodict
result = xmltodict.parse(r.text)

cd = result['dwml']['head']['product']['creation-date']['#text']
print("creation-date =",cd)

svt = result['dwml']['data']['time-layout']['start-valid-time']
print("\nstart-valid-time =", svt)
print("number of start-valid-time entries =", len(svt))

evt = result['dwml']['data']['time-layout']['end-valid-time']
print("\nend-valid-time =", evt)
print("number of end-valid-time entries =", len(evt))

pop = result['dwml']['data']['parameters']['probability-of-precipitation']['value']
print("\nprobability-of-precipitation =", pop)
print("number of probability-of-precipitation entries =", len(pop))

hqpf = result['dwml']['data']['parameters']['hourly-qpf']['value']
print("\nhourly-qpf =", hqpf)
print("number of hourly-qpf entries =", len(hqpf))

以下是运行此脚本的输出（在20150731上）：

creation-date = 2015-07-31T14:20:30-07:00

start-valid-time = ['2015-07-31T16:00:00-07:00', '2015-07-31T17:00:00-07:00', '2015-07-31T18:00:00-07:00', '2015-07-31T19:00:00-07:00', '2015-07-31T20:00:00-07:00', '2015-07-31T21:00:00-07:00', '2015-07-31T22:00:00-07:00', '2015-07-31T23:00:00-07:00', '2015-08-01T00:00:00-07:00', '2015-08-01T01:00:00-07:00', '2015-08-01T02:00:00-07:00', '2015-08-01T03:00:00-07:00', '2015-08-01T04:00:00-07:00', '2015-08-01T05:00:00-07:00', '2015-08-01T06:00:00-07:00', '2015-08-01T07:00:00-07:00', '2015-08-01T08:00:00-07:00', '2015-08-01T09:00:00-07:00', '2015-08-01T10:00:00-07:00', '2015-08-01T11:00:00-07:00', '2015-08-01T12:00:00-07:00', '2015-08-01T13:00:00-07:00', '2015-08-01T14:00:00-07:00', '2015-08-01T15:00:00-07:00', '2015-08-01T16:00:00-07:00', '2015-08-01T17:00:00-07:00', '2015-08-01T18:00:00-07:00', '2015-08-01T19:00:00-07:00', '2015-08-01T20:00:00-07:00', '2015-08-01T21:00:00-07:00', '2015-08-01T22:00:00-07:00', '2015-08-01T23:00:00-07:00', '2015-08-02T00:00:00-07:00', '2015-08-02T01:00:00-07:00', '2015-08-02T02:00:00-07:00', '2015-08-02T03:00:00-07:00', '2015-08-02T04:00:00-07:00', '2015-08-02T05:00:00-07:00', '2015-08-02T06:00:00-07:00', '2015-08-02T07:00:00-07:00', '2015-08-02T08:00:00-07:00', '2015-08-02T09:00:00-07:00', '2015-08-02T10:00:00-07:00', '2015-08-02T11:00:00-07:00', '2015-08-02T12:00:00-07:00', '2015-08-02T13:00:00-07:00', '2015-08-02T14:00:00-07:00', '2015-08-02T15:00:00-07:00', '2015-08-02T16:00:00-07:00', '2015-08-02T17:00:00-07:00', '2015-08-02T18:00:00-07:00', '2015-08-02T19:00:00-07:00', '2015-08-02T20:00:00-07:00', '2015-08-02T21:00:00-07:00', '2015-08-02T22:00:00-07:00', '2015-08-02T23:00:00-07:00', '2015-08-03T00:00:00-07:00', '2015-08-03T01:00:00-07:00', '2015-08-03T02:00:00-07:00', '2015-08-03T03:00:00-07:00', '2015-08-03T04:00:00-07:00', '2015-08-03T05:00:00-07:00', '2015-08-03T06:00:00-07:00', '2015-08-03T07:00:00-07:00', '2015-08-03T08:00:00-07:00', '2015-08-03T09:00:00-07:00', '2015-08-03T10:00:00-07:00', '2015-08-03T11:00:00-07:00', '2015-08-03T12:00:00-07:00', '2015-08-03T13:00:00-07:00', '2015-08-03T14:00:00-07:00', '2015-08-03T15:00:00-07:00', '2015-08-03T16:00:00-07:00', '2015-08-03T17:00:00-07:00', '2015-08-03T18:00:00-07:00', '2015-08-03T19:00:00-07:00', '2015-08-03T20:00:00-07:00', '2015-08-03T21:00:00-07:00', '2015-08-03T22:00:00-07:00', '2015-08-03T23:00:00-07:00', '2015-08-04T00:00:00-07:00', '2015-08-04T01:00:00-07:00', '2015-08-04T02:00:00-07:00', '2015-08-04T03:00:00-07:00', '2015-08-04T04:00:00-07:00', '2015-08-04T05:00:00-07:00', '2015-08-04T06:00:00-07:00', '2015-08-04T07:00:00-07:00', '2015-08-04T08:00:00-07:00', '2015-08-04T09:00:00-07:00', '2015-08-04T10:00:00-07:00', '2015-08-04T11:00:00-07:00', '2015-08-04T12:00:00-07:00', '2015-08-04T13:00:00-07:00', '2015-08-04T14:00:00-07:00', '2015-08-04T15:00:00-07:00', '2015-08-04T16:00:00-07:00', '2015-08-04T17:00:00-07:00', '2015-08-04T18:00:00-07:00', '2015-08-04T19:00:00-07:00', '2015-08-04T20:00:00-07:00', '2015-08-04T21:00:00-07:00', '2015-08-04T22:00:00-07:00', '2015-08-04T23:00:00-07:00', '2015-08-05T00:00:00-07:00', '2015-08-05T01:00:00-07:00', '2015-08-05T02:00:00-07:00', '2015-08-05T03:00:00-07:00', '2015-08-05T04:00:00-07:00', '2015-08-05T05:00:00-07:00', '2015-08-05T06:00:00-07:00', '2015-08-05T07:00:00-07:00', '2015-08-05T08:00:00-07:00', '2015-08-05T09:00:00-07:00', '2015-08-05T10:00:00-07:00', '2015-08-05T11:00:00-07:00', '2015-08-05T12:00:00-07:00', '2015-08-05T13:00:00-07:00', '2015-08-05T14:00:00-07:00', '2015-08-05T15:00:00-07:00', '2015-08-05T16:00:00-07:00', '2015-08-05T17:00:00-07:00', '2015-08-05T18:00:00-07:00', '2015-08-05T19:00:00-07:00', '2015-08-05T20:00:00-07:00', '2015-08-05T21:00:00-07:00', '2015-08-05T22:00:00-07:00', '2015-08-05T23:00:00-07:00', '2015-08-06T00:00:00-07:00', '2015-08-06T01:00:00-07:00', '2015-08-06T02:00:00-07:00', '2015-08-06T03:00:00-07:00', '2015-08-06T04:00:00-07:00', '2015-08-06T05:00:00-07:00', '2015-08-06T06:00:00-07:00', '2015-08-06T07:00:00-07:00', '2015-08-06T08:00:00-07:00', '2015-08-06T09:00:00-07:00', '2015-08-06T10:00:00-07:00', '2015-08-06T11:00:00-07:00', '2015-08-06T12:00:00-07:00', '2015-08-06T13:00:00-07:00', '2015-08-06T14:00:00-07:00', '2015-08-06T15:00:00-07:00', '2015-08-06T16:00:00-07:00', '2015-08-06T17:00:00-07:00', '2015-08-06T18:00:00-07:00', '2015-08-06T19:00:00-07:00', '2015-08-06T20:00:00-07:00', '2015-08-06T21:00:00-07:00', '2015-08-06T22:00:00-07:00', '2015-08-06T23:00:00-07:00', '2015-08-07T00:00:00-07:00', '2015-08-07T01:00:00-07:00', '2015-08-07T02:00:00-07:00', '2015-08-07T03:00:00-07:00', '2015-08-07T04:00:00-07:00', '2015-08-07T05:00:00-07:00', '2015-08-07T06:00:00-07:00', '2015-08-07T07:00:00-07:00', '2015-08-07T08:00:00-07:00', '2015-08-07T09:00:00-07:00', '2015-08-07T10:00:00-07:00', '2015-08-07T11:00:00-07:00', '2015-08-07T12:00:00-07:00', '2015-08-07T13:00:00-07:00', '2015-08-07T14:00:00-07:00', '2015-08-07T15:00:00-07:00']
number of start-valid-time entries = 168

end-valid-time = ['2015-07-31T17:00:00-07:00', '2015-07-31T18:00:00-07:00', '2015-07-31T19:00:00-07:00', '2015-07-31T20:00:00-07:00', '2015-07-31T21:00:00-07:00', '2015-07-31T22:00:00-07:00', '2015-07-31T23:00:00-07:00', '2015-08-01T00:00:00-07:00', '2015-08-01T01:00:00-07:00', '2015-08-01T02:00:00-07:00', '2015-08-01T03:00:00-07:00', '2015-08-01T04:00:00-07:00', '2015-08-01T05:00:00-07:00', '2015-08-01T06:00:00-07:00', '2015-08-01T07:00:00-07:00', '2015-08-01T08:00:00-07:00', '2015-08-01T09:00:00-07:00', '2015-08-01T10:00:00-07:00', '2015-08-01T11:00:00-07:00', '2015-08-01T12:00:00-07:00', '2015-08-01T13:00:00-07:00', '2015-08-01T14:00:00-07:00', '2015-08-01T15:00:00-07:00', '2015-08-01T16:00:00-07:00', '2015-08-01T17:00:00-07:00', '2015-08-01T18:00:00-07:00', '2015-08-01T19:00:00-07:00', '2015-08-01T20:00:00-07:00', '2015-08-01T21:00:00-07:00', '2015-08-01T22:00:00-07:00', '2015-08-01T23:00:00-07:00', '2015-08-02T00:00:00-07:00', '2015-08-02T01:00:00-07:00', '2015-08-02T02:00:00-07:00', '2015-08-02T03:00:00-07:00', '2015-08-02T04:00:00-07:00', '2015-08-02T05:00:00-07:00', '2015-08-02T06:00:00-07:00', '2015-08-02T07:00:00-07:00', '2015-08-02T08:00:00-07:00', '2015-08-02T09:00:00-07:00', '2015-08-02T10:00:00-07:00', '2015-08-02T11:00:00-07:00', '2015-08-02T12:00:00-07:00', '2015-08-02T13:00:00-07:00', '2015-08-02T14:00:00-07:00', '2015-08-02T15:00:00-07:00', '2015-08-02T16:00:00-07:00', '2015-08-02T17:00:00-07:00', '2015-08-02T18:00:00-07:00', '2015-08-02T19:00:00-07:00', '2015-08-02T20:00:00-07:00', '2015-08-02T21:00:00-07:00', '2015-08-02T22:00:00-07:00', '2015-08-02T23:00:00-07:00', '2015-08-03T00:00:00-07:00', '2015-08-03T01:00:00-07:00', '2015-08-03T02:00:00-07:00', '2015-08-03T03:00:00-07:00', '2015-08-03T04:00:00-07:00', '2015-08-03T05:00:00-07:00', '2015-08-03T06:00:00-07:00', '2015-08-03T07:00:00-07:00', '2015-08-03T08:00:00-07:00', '2015-08-03T09:00:00-07:00', '2015-08-03T10:00:00-07:00', '2015-08-03T11:00:00-07:00', '2015-08-03T12:00:00-07:00', '2015-08-03T13:00:00-07:00', '2015-08-03T14:00:00-07:00', '2015-08-03T15:00:00-07:00', '2015-08-03T16:00:00-07:00', '2015-08-03T17:00:00-07:00', '2015-08-03T18:00:00-07:00', '2015-08-03T19:00:00-07:00', '2015-08-03T20:00:00-07:00', '2015-08-03T21:00:00-07:00', '2015-08-03T22:00:00-07:00', '2015-08-03T23:00:00-07:00', '2015-08-04T00:00:00-07:00', '2015-08-04T01:00:00-07:00', '2015-08-04T02:00:00-07:00', '2015-08-04T03:00:00-07:00', '2015-08-04T04:00:00-07:00', '2015-08-04T05:00:00-07:00', '2015-08-04T06:00:00-07:00', '2015-08-04T07:00:00-07:00', '2015-08-04T08:00:00-07:00', '2015-08-04T09:00:00-07:00', '2015-08-04T10:00:00-07:00', '2015-08-04T11:00:00-07:00', '2015-08-04T12:00:00-07:00', '2015-08-04T13:00:00-07:00', '2015-08-04T14:00:00-07:00', '2015-08-04T15:00:00-07:00', '2015-08-04T16:00:00-07:00', '2015-08-04T17:00:00-07:00', '2015-08-04T18:00:00-07:00', '2015-08-04T19:00:00-07:00', '2015-08-04T20:00:00-07:00', '2015-08-04T21:00:00-07:00', '2015-08-04T22:00:00-07:00', '2015-08-04T23:00:00-07:00', '2015-08-05T00:00:00-07:00', '2015-08-05T01:00:00-07:00', '2015-08-05T02:00:00-07:00', '2015-08-05T03:00:00-07:00', '2015-08-05T04:00:00-07:00', '2015-08-05T05:00:00-07:00', '2015-08-05T06:00:00-07:00', '2015-08-05T07:00:00-07:00', '2015-08-05T08:00:00-07:00', '2015-08-05T09:00:00-07:00', '2015-08-05T10:00:00-07:00', '2015-08-05T11:00:00-07:00', '2015-08-05T12:00:00-07:00', '2015-08-05T13:00:00-07:00', '2015-08-05T14:00:00-07:00', '2015-08-05T15:00:00-07:00', '2015-08-05T16:00:00-07:00', '2015-08-05T17:00:00-07:00', '2015-08-05T18:00:00-07:00', '2015-08-05T19:00:00-07:00', '2015-08-05T20:00:00-07:00', '2015-08-05T21:00:00-07:00', '2015-08-05T22:00:00-07:00', '2015-08-05T23:00:00-07:00', '2015-08-06T00:00:00-07:00', '2015-08-06T01:00:00-07:00', '2015-08-06T02:00:00-07:00', '2015-08-06T03:00:00-07:00', '2015-08-06T04:00:00-07:00', '2015-08-06T05:00:00-07:00', '2015-08-06T06:00:00-07:00', '2015-08-06T07:00:00-07:00', '2015-08-06T08:00:00-07:00', '2015-08-06T09:00:00-07:00', '2015-08-06T10:00:00-07:00', '2015-08-06T11:00:00-07:00', '2015-08-06T12:00:00-07:00', '2015-08-06T13:00:00-07:00', '2015-08-06T14:00:00-07:00', '2015-08-06T15:00:00-07:00', '2015-08-06T16:00:00-07:00', '2015-08-06T17:00:00-07:00', '2015-08-06T18:00:00-07:00', '2015-08-06T19:00:00-07:00', '2015-08-06T20:00:00-07:00', '2015-08-06T21:00:00-07:00', '2015-08-06T22:00:00-07:00', '2015-08-06T23:00:00-07:00', '2015-08-07T00:00:00-07:00', '2015-08-07T01:00:00-07:00', '2015-08-07T02:00:00-07:00', '2015-08-07T03:00:00-07:00', '2015-08-07T04:00:00-07:00', '2015-08-07T05:00:00-07:00', '2015-08-07T06:00:00-07:00', '2015-08-07T07:00:00-07:00', '2015-08-07T08:00:00-07:00', '2015-08-07T09:00:00-07:00', '2015-08-07T10:00:00-07:00', '2015-08-07T11:00:00-07:00', '2015-08-07T12:00:00-07:00', '2015-08-07T13:00:00-07:00', '2015-08-07T14:00:00-07:00', '2015-08-07T15:00:00-07:00', '2015-08-07T16:00:00-07:00']
number of end-valid-time entries = 168

probability-of-precipitation = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '10', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '23', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '12', '34', '34', '34', '34', '34', '34', '34', '34', '34', '34', '34']
number of probability-of-precipitation entries = 168

hourly-qpf = ['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0.0050', '0.0050', '0.0050', '0.0050', '0.0050', '0.0050', '0.0033', '0.0033', '0.0033', '0.0033', '0.0033', '0.0033', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0.0017', '0.0017', '0.0017', '0.0017', '0.0017', '0.0017', '0.0083', '0.0083', '0.0083', '0.0083', '0.0083']
number of hourly-qpf entries = 168

网友

2楼 · 编辑于 2024-05-20 00:05:27

可以使用python xml库：

https://docs.python.org/2/library/xml.etree.elementtree.html

import urllib2
import xml.etree.ElementTree as ET
page = urllib2.urlopen('http://forecast.weather.gov/MapClick.php?lat=47.6062&lon=-122.3321&FcstType=digitalDWML')
page_content = page.read()
root = ET.fromstring(page_content)
for _f in root.itertext():
    ***Do your formatting here***

相关问题更多 >

编程相关推荐

热门问题

热门文章