解析XML（musicbrainz）Python中

import urllib2 import codecs import sys import os from xml.dom import minidom import xml.etree.cElementTree as ET #urlbob = urllib2.urlopen('http://musicbrainz.org/ws/2/artist/72c536dc-7137-4477-a521-567eeb840fa8') url = 'dylan.xml' #attempt 1 - using minidom xmldoc = minidom.parse(url) itemlist = xmldoc.getElementsByTagName('artist') #attempt 2 - using ET tree = ET.parse('dylan.xml') root = tree.getroot() for child in root: print child.tag, child.attrib

2条回答

网友

1楼 · 编辑于 2024-06-01 21:32:01

## This prints out the tree as the xml lib sees it 
## (I found it made debugging a little easier)
#def print_xml(node, depth = 0):
#    for child in node:
#        print "\t"*depth + str(child)
#        print_xml(child, depth = depth + 1)
#print_xml(root)

# attempt 1
xmldoc = minidom.parse(url)
genders = xmldoc.getElementsByTagName('gender') # <== you want gender not artist
for gender in genders:
    print gender.firstChild.nodeValue

# attempt 2
ns = "{http://musicbrainz.org/ns/mmd-2.0#}"
xlpath = "./" + ns + "artist/" + ns + "gender"
genders = root.findall(xlpath) # <== xpath was made for this..
for gender in genders:
    print gender.text

所以。。第一次尝试的问题是，您看到的是所有艺术家元素的列表，而不是性别元素（列表中唯一艺术家元素的子元素）。在

第二次尝试的问题是，您正在查看根元素的子元素列表（该列表包含单个元数据元素）。在

基本结构是：

^{2}$

因此，您需要获取root->；artist->；gender，或者只搜索您真正想要的节点（在本例中是gender）。在

网友

2楼 · 编辑于 2024-06-01 21:32:01

这是因为你在循环root它只是树的根，这有意义吗？当循环根时，它只返回下一个子级并在那里停止。在

您需要循环iterable，以便它返回下一个节点并获得结果，请参见以下内容：

tree = ET.parse('dylan.xml')
root = tree.getroot()

# loop the root iterable which will keep returning next node
for node in root.iter(): # or root.getiterator() if < Python 2.7
    print node.tag, node.attrib, node.text

结果：

^{2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章