访问用ElementT解析的xml文件中的嵌套子项

2024-06-28 20:25:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我是xml解析的新手。This xml file具有以下树:

FHRSEstablishment
 |--> Header
 |    |--> ...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...

但当我使用ElementTree访问它并查找child标记和属性时

import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
   file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
   print child.tag, child.attrib

我只得到:

Header {}
EstablishmentCollection {}

我认为这意味着它们的属性是空的。为什么是这样,我如何访问嵌套在EstablishmentDetailScores中的子元素?

编辑

由于下面的答案,我可以进入树中,但是如果我想检索诸如Scores中的值,这将失败:

for node in root.find('.//EstablishmentDetail/Scores'):
    rating = node.attrib.get('Hygiene')
    print rating 

并产生

None
None
None

为什么?


Tags: importnonechildtree属性rootxmlurllib2
2条回答

希望它能有用:

import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
    doc = etree.iterparse(tmpfile, events=("start", "end"))
    doc = iter(doc)
    event, root = doc.next()
    num = 0
    for event, elem in doc:
        print event, elem

你必须在你的根上。

这就是root.iter()的诀窍!

import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
   print child.tag, child.attrib

输出:

FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
  • 要在EstablishmentDetail中获取所有标记,您需要找到该标记,然后遍历其子标记!

比如说。

for child in root.find('.//EstablishmentDetail'):
    print child.tag, child.attrib

输出:

FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
  • 为了得到你在评论中提到的Hygiene的分数

你所做的是,它将得到第一个Scores标记,当你调用for each in root.find('.//Scores'):rating=child.get('Hygiene')时,它将具有卫生性、信任管理、结构标记。也就是说,显然三个孩子都没有这个元素!

你得先 -找到所有标签。 -在找到的每个标记中查找Hygiene

for each in root.findall('.//Scores'):
    rating = each.find('.//Hygiene')
    print '' if rating is None else rating.text

输出:

5
5
5
0
5

相关问题 更多 >