如何使用Python提取/解析字典元素?

2024-09-21 00:52:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从几十年中提取00,但所有的尝试都没有达到预期的效果

下面是我的XML文件的一部分,另存为gorillas_catalog.XML

<CATALOG>
    <CD decade="00s">
        <TITLE>Gorillaz</TITLE>
        <ARTIST>Gorillaz</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>Virgin</COMPANY>
        <PRICE>10.90</PRICE>
        <YEAR>2001</YEAR>
    </CD>
    <CD decade="00s">
        <TITLE>Demon Days</TITLE>
        <ARTIST>Gorillaz</ARTIST>
        <COUNTRY>UK</COUNTRY>
        <COMPANY>Parlaphone</COMPANY>
        <PRICE>9.90</PRICE>
        <YEAR>1988</YEAR>
    </CD>

我的预期结果如下:

Title: Gorillaz, Album: Gorillaz, Decade: 00s
Title: Gorillaz, Album: Demon Days, Decade: 00s

通过XML文件的其余部分,依此类推

我测试了每个部件,得到如下代码:

import xml.etree.ElementTree as ET

tree = ET.parse("gorillaz_catalog.xml")
root = tree.getroot()

for ARTIST in root.iter("ARTIST"):
    print("Artist:", ARTIST.text)

for TITLE in root.iter("TITLE"):
    print("Title:", TITLE.text)

for decade in root.iter("CD"):
    print("Decade:", decade.attrib)

十年来,我一直在接受Decade: {'decade': '00s'},而我只想00s

然后,我尝试循环所有内容以获得我想要的结果(在对上面的3个语句进行注释之后)

for ARTIST in root.iter("ARTIST"):
    for TITLE in root.iter("TITLE"):
        for decade in root.iter("CD"):
            print("Artist:", ARTIST.text,", Title:", TITLE.text, ", Decade:", decade.attrib)

我得到的结果循环了20到20次:

Artist: Gorillaz , Album: Gorillaz , Decade: {'decade': 00s'}

二十次(这是文件中记录的数量),然后

Artist: Gorillaz , Album: Demon Days , Decade: {'decade': '80s'}

二十次

这给了我想要的线路,但我不需要每次20次

  1. 很明显,我的嵌套循环是不正确的,那么如何让它产生我想要的行呢?我想我可能需要把这些项目放在字典列表中,但我不太熟悉如何做到这一点

Tags: infortitleartistcdrootyearcountry
2条回答

这是我在发布后查看更多文档后的最终代码。谢谢大家的建议

import xml.etree.ElementTree as ET

tree = ET.parse("gorillaz_catalog.xml")
root = tree.getroot()

for item in tree.iterfind("CD"):
    artist = item.findtext("ARTIST")
    title = item.findtext("TITLE")
    decade = item.get("decade")
    print(f"Artist: {artist}, Album: {title}, Decade: {decade}")

输出:

> Title: Gorillaz, Album: Gorillaz, Decade: 00s
> Title: Gorillaz, Album: Demon Days, Decade: 00s

我觉得你把事情弄得有点太复杂了;使用另一个库和xpath进行尝试:

import lxml.html as lh

cds = """[your html above]"""

doc = lh.fromstring(cds)
for cd in doc.xpath('//cd'):
    decade = cd.xpath('./@decade')[0]
    title = cd.xpath('./title/text()')[0]
    artist = cd.xpath('./artist/text()')[0]
    print("Title: "+title+", Artist: "+artist+", Decade: "+decade)

输出:

Title: Gorillaz, Artist: Gorillaz, Decade: 00s
Title: Demon Days, Artist: Gorillaz, Decade: 00s

相关问题 更多 >

    热门问题