使用xml.etree解析多个节点的Python xml

2024-09-28 20:46:47 发布

您现在位置:Python中文网/ 问答频道 /正文

import os
import xml.etree.ElementTree
from xml.etree import ElementTree
file_name = 'pubmed21n0001.xml'
full_file = os.path.abspath(os.path.join('data', file_name))

dom = ElementTree.parse(full_file)
pubmed = dom.findall('PubmedArticle/MedlineCitation')

for p in pubmed:
        LastName = p.find('Article/AuthorList/Author/LastName').text
        ForeName = p.find('Article/AuthorList/Author/ForeName').text
        Initials = p.find('Article/AuthorList/Author/Initials').text
        print('{}_{}_{}'.format(LastName, ForeName, Initials))

this is the python code i wrote

<root>
  <PubmedArticle>
    <MedlineCitation Status="MEDLINE" Owner="NLM">
      <Article PubModel="Print">
        <AuthorList CompleteYN="Y">
          <Author ValidYN="Y">
            <LastName>Makar</LastName>
            <ForeName>A B</ForeName>
            <Initials>AB</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>McMartin</LastName>
            <ForeName>K E</ForeName>
            <Initials>KE</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Palese</LastName>
            <ForeName>M</ForeName>
            <Initials>M</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Tephly</LastName>
            <ForeName>T R</ForeName>
            <Initials>TR</Initials>
          </Author>
        </AuthorList>
      </Article>
    </MedlineCitation>
  </PubmedArticle>
</root>

如何解析多个名称? 如果我运行代码,只会打印一个名称 这是pubmed21n001.xml文件的简明版本 原始档案中有很多


Tags: importosarticlexmlfindfileauthorelementtree
2条回答

进一步进入括号:


# Find a list of author nodes
pubmed = dom.findall('PubmedArticle/MedlineCitation/Article/AuthorList/Author')

for p in pubmed:
        # Now you're able to access props directly
        LastName = p.find('LastName').text
        ForeName = p.find('ForeName').text
        Initials = p.find('Initials').text

试试下面的方法

import xml.etree.ElementTree as ET

xml = '''<root>
  <PubmedArticle>
    <MedlineCitation Status="MEDLINE" Owner="NLM">
      <Article PubModel="Print">
        <AuthorList CompleteYN="Y">
          <Author ValidYN="Y">
            <LastName>Makar</LastName>
            <ForeName>A B</ForeName>
            <Initials>AB</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>McMartin</LastName>
            <ForeName>K E</ForeName>
            <Initials>KE</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Palese</LastName>
            <ForeName>M</ForeName>
            <Initials>M</Initials>
          </Author>
          <Author ValidYN="Y">
            <LastName>Tephly</LastName>
            <ForeName>T R</ForeName>
            <Initials>TR</Initials>
          </Author>
        </AuthorList>
      </Article>
    </MedlineCitation>
  </PubmedArticle>
</root>'''

root = ET.fromstring(xml)
for a in root.findall('.//Author'):
    print(f'{a.find("LastName").text}_{a.find("ForeName").text}_{a.find("Initials").text}')

输出

Makar_A B_AB
McMartin_K E_KE
Palese_M_M
Tephly_T R_TR

相关问题 更多 >