使用elem获取XML中特定标记的内容

2024-09-27 19:19:51 发布

您现在位置:Python中文网/ 问答频道 /正文

以下是我的XML数据:

<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
  <PMID Version="1">1883738</PMID>
  <DateCompleted>
    <Year>1991</Year>
    <Month>10</Month>
    <Day>07</Day>
  </DateCompleted>
  <DateRevised>
    <Year>2013</Year>
    <Month>11</Month>
    <Day>21</Day>
  </DateRevised>
  <Article PubModel="Print">
    <Journal>
      <ISSN IssnType="Print">0959-9673</ISSN>
      <JournalIssue CitedMedium="Print">
        <Volume>72</Volume>
        <Issue>4</Issue>
        <PubDate>
          <Year>1991</Year>
          <Month>Aug</Month>
        </PubDate>
      </JournalIssue>
      <Title>International journal of experimental pathology</Title>
      <ISOAbbreviation>Int J Exp Pathol</ISOAbbreviation>
    </Journal>
    <ArticleTitle>The effect of HeNe laser radiation on the thyroid gland of the rat.</ArticleTitle>
    <Pagination>
      <MedlinePgn>379-85</MedlinePgn>
    </Pagination>
    <Abstract>
      <AbstractText>Although laser irradiation is becoming common practice in medicine, there is not always a clear understanding of the possible side-effects. The present report is a light and electron microscopic study of the effects of fixed low intensity doses of soft HeNe laser on the thyroid of Wistar rats. The immediate effects are mild multifocal degenerative changes; these lesions recover in less than 3 months. Long-term lesions are identified only by electron microscopy; they consist of an increased number of peroxisomes and free or intramitochondrial crystalline structures. We discuss the laser's hypothetical functions.</AbstractText>
    </Abstract>
    <AuthorList CompleteYN="Y">
      <Author ValidYN="Y">
        <LastName>Lerma</LastName>
        <ForeName>E</ForeName>
        <Initials>E</Initials>
        <AffiliationInfo>
          <Affiliation>Department of Pathology and Radiology, Hospital Universitario Virgen Macarena, University of Seville, Spain.</Affiliation>
        </AffiliationInfo>
      </Author>
      <Author ValidYN="Y">
        <LastName>Hevia</LastName>
        <ForeName>A</ForeName>
        <Initials>A</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Rodrigo</LastName>
        <ForeName>P</ForeName>
        <Initials>P</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Gonzalez-Campora</LastName>
        <ForeName>R</ForeName>
        <Initials>R</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Armas</LastName>
        <ForeName>J R</ForeName>
        <Initials>JR</Initials>
      </Author>
      <Author ValidYN="Y">
        <LastName>Galera</LastName>
        <ForeName>H</ForeName>
        <Initials>H</Initials>
      </Author>
    </AuthorList>
    <Language>eng</Language>
    <PublicationTypeList>
      <PublicationType UI="D016428">Journal Article</PublicationType>
    </PublicationTypeList>
  </Article>
  <MedlineJournalInfo>
    <Country>England</Country>
    <MedlineTA>Int J Exp Pathol</MedlineTA>
    <NlmUniqueID>9014042</NlmUniqueID>
    <ISSNLinking>0959-9673</ISSNLinking>
  </MedlineJournalInfo>
  <ChemicalList>
    <Chemical>
      <RegistryNumber>06LU7C9H1V</RegistryNumber>
      <NameOfSubstance UI="D014284">Triiodothyronine</NameOfSubstance>
    </Chemical>
    <Chemical>
      <RegistryNumber>Q51BO43MG4</RegistryNumber>
      <NameOfSubstance UI="D013974">Thyroxine</NameOfSubstance>
    </Chemical>
  </ChemicalList>
  <CitationSubset>IM</CitationSubset>
  <CommentsCorrectionsList>
    <CommentsCorrections RefType="Cites">
      <RefSource>J Histochem Cytochem. 1969 Oct;17(10):675-80</RefSource>
      <PMID Version="1">4194356</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Acta Anat (Basel). 1986;125(1):10-3</RefSource>
      <PMID Version="1">3953239</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Anat Anz. 1977;142(3):209-12</RefSource>
      <PMID Version="1">603070</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>J Cell Biol. 1964 Nov;23:383-5</RefSource>
      <PMID Version="1">14222822</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>J Cell Biol. 1967 Jun;33(3):605-23</RefSource>
      <PMID Version="1">6036524</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Am J Med. 1983 May;74(5):852-62</RefSource>
      <PMID Version="1">6837608</PMID>
    </CommentsCorrections>
    <CommentsCorrections RefType="Cites">
      <RefSource>Exp Eye Res. 1977 Jan;24(1):45-56</RefSource>
      <PMID Version="1">402283</PMID>
    </CommentsCorrections>
  </CommentsCorrectionsList>
  <MeshHeadingList>
    <MeshHeading>
      <DescriptorName UI="D000818" MajorTopicYN="N">Animals</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D007834" MajorTopicYN="N">Lasers</DescriptorName>
      <QualifierName UI="Q000009" MajorTopicYN="Y">adverse effects</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D008830" MajorTopicYN="N">Microbodies</DescriptorName>
      <QualifierName UI="Q000528" MajorTopicYN="N">radiation effects</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D008854" MajorTopicYN="N">Microscopy, Electron</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D051381" MajorTopicYN="N">Rats</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D011919" MajorTopicYN="N">Rats, Inbred Strains</DescriptorName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D013961" MajorTopicYN="N">Thyroid Gland</DescriptorName>
      <QualifierName UI="Q000528" MajorTopicYN="Y">radiation effects</QualifierName>
      <QualifierName UI="Q000648" MajorTopicYN="N">ultrastructure</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D013974" MajorTopicYN="N">Thyroxine</DescriptorName>
      <QualifierName UI="Q000097" MajorTopicYN="N">blood</QualifierName>
    </MeshHeading>
    <MeshHeading>
      <DescriptorName UI="D014284" MajorTopicYN="N">Triiodothyronine</DescriptorName>
      <QualifierName UI="Q000097" MajorTopicYN="N">blood</QualifierName>
    </MeshHeading>
  </MeshHeadingList>
  <OtherID Source="NLM">PMC2001961</OtherID>
</MedlineCitation>
<PubmedData>

我需要从文档中提取所有作者的姓氏。但是,有多个这样的文件,每个文件都有不同的作者名称。如何解析这个文件并将作者的姓氏提取到一个列表中以创建一个数据库

我使用了elementtree来解析文档。以下是我的代码:

tree = ET.parse("file path"+file)
            doc = tree.getroot()
            for LastName in doc.iter('LastName'):
                file1 = (ET.tostring(LastName, encoding='utf8').decode('utf8'))
                file2 = file1[48:(len(file1))]
                author_name_lastname = file2.split("<")[0]
                print(author_name_lastname)

在这里我只能打印第一作者的名字“勒玛”


Tags: ofuiversionauthorlastnameinitialspmidforename
1条回答
网友
1楼 · 发布于 2024-09-27 19:19:51
import os
from lxml import etree as ET

DIR="D:\yourfilesdirectory/"

for filename in os.listdir(DIR):
    if filename.endswith(".xml"):
        with open(file=DIR+filename,mode='r',encoding='utf-8') as file:
            _tree = ET.fromstring(text=file.read())
            _all_metadata_tags = _tree.xpath('.//LastName')
            for i in _all_metadata_tags:
                print(i.text + '\n')

    else:
        print("skipping for filename")

相关问题 更多 >

    热门问题