使用python中的XML解析问题xml.etree.ElementT

2024-09-22 16:38:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些http响应生成的以下xml

<?xml version="1.0" encoding="UTF-8"?>
<Response rid="1000" status="succeeded" moreData="false">
  <Results completed="true" total="25" matched="5" processed="25">
      <Resource type="h" DisplayName="Host" name="tango">
          <Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
             <PerfData attrId="cpuUsage" attrName="Usage">
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="36.00"/>
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="86.00"/>
                <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="29.00"/>
             </PerfData>
          <Resource type="vm" DisplayName="VM" name="charlie" baseHost="tango">
              <Time start="2011/12/16/18/46/00" end="2011/12/16/19/46/00"/>
              <PerfData attrId="cpuUsage" attrName="Usage">
                 <Data intr="5" start="2011/12/16/19" end="2011/12/16/19" data="6.00"/>
              </PerfData>
          </Resource>
      </Resource>
  </Result>
</Response>

如果你仔细看的话- 外面还有一个相同的标签在里面

因此,高层xml结构如下所示

^{pr2}$

Python ElementTree只能解析外部xml。。。下面是我的代码

pattern = re.compile(r'(<Response.*?</Response>)',
                     re.VERBOSE | re.MULTILINE)

for match in pattern.finditer(data):
    contents = match.group(1)
    responses = xml.fromstring(contents)

    for results in responses:
        result = results.tag

        for resources in results:
            resource = resources.tag
            temp = {}
            temp = resources.attrib
            print temp

显示以下输出(温度)

{'typeDisplayName': 'Host', 'type': 'h', 'name': 'tango'}

如何获取内部属性?在


Tags: nameinrefordataresponsetypexml
1条回答
网友
1楼 · 发布于 2024-09-22 16:38:59

不要用正则表达式解析xml!这行不通,请改用一些xml解析库,例如lxml:

编辑:代码示例现在只获取顶级资源,循环它们并尝试获取“子资源”,这是在注释中的OP请求之后进行的

from lxml import etree

content = '''
YOUR XML HERE
'''

root = etree.fromstring(content)

# search for all "top level" resources
resources = root.xpath("//Resource[not(ancestor::Resource)]")
for resource in resources:
    # copy resource attributes in a dict
    mashup = dict(resource.attrib)
    # find child resource elements
    subresources = resource.xpath("./Resource")
    # if we find only one resource, add it to the mashup
    if len(subresources) == 1:
        mashup['resource'] = dict(subresources[0].attrib)
    # else... not idea what the OP wants...

    print mashup

将输出:

^{pr2}$

相关问题 更多 >