使用python从xml中提取特定数据

2024-05-20 20:22:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从根为[0]的data.xml中收集特定信息,'CaplockSet'包含100多个'Caplock',我只需要提取作者信息!请帮助我,非常感谢你的支持

<?xml version="1.0"?>

<CaplockSet>

<Caplock>
    <MedlineCitation Status="clonelisher" Owner="NLM">
        <PMID Version="1">32045906</PMID>
        <DateRevised>
            <Year>2020</Year>
            <Month>02</Month>
            <Day>11</Day>
        </DateRevised>
        <Article cloneModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">1423-0135</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <cloneDate>
                        <Year>2020</Year>
                        <Month>Feb</Month>
                        <Day>11</Day>
                    </cloneDate>
                </JournalIssue>
                <Title>Journal of vascular research</Title>
                <ISOAbbreviation>J. Vasc. Res.</ISOAbbreviation>
            </Journal>
            <ArticleTitle>miR-96-5p Regulates Proliferation, Migration, and Apoptosis of Vascular Smooth Muscle Cell Induced by Angiotensin II via Targeting NFAT5.</ArticleTitle>
            <Pagination>
                <MedlinePgn>1-11</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.1159/000505457</ELocationID>
            <Abstract>
                <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Aberrant proliferation, migration, and apoptosis of vascular smooth muscle cells (VSMCs) are major pathological phenomenon in hypertension. MicroRNAs (miRNAs/miRs) serve crucial roles in the progression of hypertension. We aimed to determine the role of miR-96-5p in the proliferation, migration, and apoptosis of VSMCs and its underlying mechanisms.</AbstractText>
                <AbstractText Label="METHODS" NlmCategory="METHODS">Angiotensin II (Ang II) was employed to treat VSMCs, and the expression of miR-96-5p was detected by RT-qPCR. Then, miR-96-5p mimic was transfected into VSMCs. Cell Counting Kit-8 assay, flow cytometry, transwell assay, and wound healing assay were applied to measure proliferation, cell cycle, and migration of VSMCs. The expression of proteins associated with proliferation, migration, and apoptosis was assessed. A luciferase reporter assay was applied to confirm the target binding between miR-96-5p and nuclear factors of activated T-cells 5 (NFAT5). Subsequently, siRNA was used to silence NFAT5, and cell proliferation, migration, and apoptosis were assessed.</AbstractText>
                <AbstractText Label="RESULTS" NlmCategory="RESULTS">The results revealed that the expression of miR-96-5p was downregulated in Ang II-induced VSMCs. MiR-96-5p overexpression inhibited cell proliferation and migration but promoted cell apoptosis, enhanced the percentages of cells in the G1 and G2 phases, and reduced those in the S phase, accompanied by changes in the expression associated proteins. NFAT5 was confirmed as a direct target of miR-96-5p. NFAT5 silencing had the same results with miR-96-5p overexpression on VSMC proliferation, migration, and apoptosis, whereas miR-96-5p inhibitor reversed these effects.</AbstractText>
                <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">Our findings concluded that miR-96-5p could regulate proliferation, migration, and apoptosis of VSMCs induced by Ang II via targeting NFAT5.</AbstractText>
                <CopyrightInformation>© 2020 S. Karger AG, Basel.</CopyrightInformation>
            </Abstract>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Tian</LastName>
                    <ForeName>Long</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Cai</LastName>
                    <ForeName>Dinghua</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Zhuang</LastName>
                    <ForeName>Derong</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Wenyuan</ForeName>
                    <Initials>W</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Xuan</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Bian</LastName>
                    <ForeName>Xiaoli</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Xu</LastName>
                    <ForeName>Rui</ForeName>
                    <Initials>R</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Nephrology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wu</LastName>
                    <ForeName>Guanji</ForeName>
                    <Initials>G</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Xi'an Central Hospital of Xi'an Jiaotong University, Xi'an, China, guanjiguanji22@163.com.</Affiliation>
                    </AffiliationInfo>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <clonelicationTypeList>
                <clonelicationType UI="D016428">Journal Article</clonelicationType>
            </clonelicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2020</Year>
                <Month>02</Month>
                <Day>11</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <Country>Switzerland</Country>
            <MedlineTA>J Vasc Res</MedlineTA>
            <NlmUniqueID>9206092</NlmUniqueID>
            <ISSNLinking>1018-1172</ISSNLinking>
        </MedlineJournalInfo>
        <CitationSubset>IM</CitationSubset>
        <KeywordList Owner="NOTNLM">
            <Keyword MajorTopicYN="N">Migration</Keyword>
            <Keyword MajorTopicYN="N">NFAT5</Keyword>
            <Keyword MajorTopicYN="N">Proliferation</Keyword>
            <Keyword MajorTopicYN="N">Vascular smooth muscle cell</Keyword>
            <Keyword MajorTopicYN="N">miR-96-5p</Keyword>
        </KeywordList>
    </MedlineCitation>
    <CardData>
        <History>
            <CardcloneDate cloneStatus="received">
                <Year>2019</Year>
                <Month>09</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="accepted">
                <Year>2019</Year>
                <Month>12</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="entrez">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="Card">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="medline">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
        </History>
        <clonelicationStatus>aheadofprint</clonelicationStatus>
        <ArticleIdList>
            <ArticleId IdType="Card">32045906</ArticleId>
            <ArticleId IdType="pii">000505457</ArticleId>
            <ArticleId IdType="doi">10.1159/000505457</ArticleId>
        </ArticleIdList>
    </CardData>
</Caplock>


</CaplockSet>

我尝试了多种方法来摆脱这个.py代码,但是我面临着很多错误。我详细阐述了下面的一种方法

import xml.etree.ElementTree as ET

mytree = ET.parse('data.xml')
myroot = mytree.getroot()
for x in myroot.findall('Author'):
    lastname = x.find('LastName').text
    forename = x.find('ForeName').text
    affiliation = x.find('AffiliationInfo/Affiliation').text

    print(lastname,forename,affiliation)

错误

Traceback (most recent call last):
  File "c:/Users/jeeva/Desktop/data/program.py", line 3, in <module>
    mytree = ET.parse('data/data.xml')
  File "C:\Users\jeeva\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1202, in parse
    tree.parse(source, parser)
  File "C:\Users\jeeva\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 595, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: syntax error: line 2, column 21

Tags: andoftheinyearkeywordauthormir
2条回答

也许这应该管用

def find_rec(node):
    for item in node.iter():
        if item.tag == "Author":
            author_values = {}
            for i in item.iter():
                author_values[i.tag] = i.text
            yield author_values


auth = find_rec(ET.parse('./data.xml').getroot())
for x in auth:
    print(x["LastName"], x["ForeName"], x["Affiliation"])

一艘班轮:

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0"?>
<CaplockSet>
<Caplock>
    <MedlineCitation Status="clonelisher" Owner="NLM">
        <PMID Version="1">32045906</PMID>
        <DateRevised>
            <Year>2020</Year>
            <Month>02</Month>
            <Day>11</Day>
        </DateRevised>
        <Article cloneModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">1423-0135</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <cloneDate>
                        <Year>2020</Year>
                        <Month>Feb</Month>
                        <Day>11</Day>
                    </cloneDate>
                </JournalIssue>
                <Title>Journal of vascular research</Title>
                <ISOAbbreviation>J. Vasc. Res.</ISOAbbreviation>
            </Journal>
            <ArticleTitle>miR-96-5p Regulates Proliferation, Migration, and Apoptosis of Vascular Smooth Muscle Cell Induced by Angiotensin II via Targeting NFAT5.</ArticleTitle>
            <Pagination>
                <MedlinePgn>1-11</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.1159/000505457</ELocationID>
            <Abstract>
                <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Aberrant proliferation, migration, and apoptosis of vascular smooth muscle cells (VSMCs) are major pathological phenomenon in hypertension. MicroRNAs (miRNAs/miRs) serve crucial roles in the progression of hypertension. We aimed to determine the role of miR-96-5p in the proliferation, migration, and apoptosis of VSMCs and its underlying mechanisms.</AbstractText>
                <AbstractText Label="METHODS" NlmCategory="METHODS">Angiotensin II (Ang II) was employed to treat VSMCs, and the expression of miR-96-5p was detected by RT-qPCR. Then, miR-96-5p mimic was transfected into VSMCs. Cell Counting Kit-8 assay, flow cytometry, transwell assay, and wound healing assay were applied to measure proliferation, cell cycle, and migration of VSMCs. The expression of proteins associated with proliferation, migration, and apoptosis was assessed. A luciferase reporter assay was applied to confirm the target binding between miR-96-5p and nuclear factors of activated T-cells 5 (NFAT5). Subsequently, siRNA was used to silence NFAT5, and cell proliferation, migration, and apoptosis were assessed.</AbstractText>
                <AbstractText Label="RESULTS" NlmCategory="RESULTS">The results revealed that the expression of miR-96-5p was downregulated in Ang II-induced VSMCs. MiR-96-5p overexpression inhibited cell proliferation and migration but promoted cell apoptosis, enhanced the percentages of cells in the G1 and G2 phases, and reduced those in the S phase, accompanied by changes in the expression associated proteins. NFAT5 was confirmed as a direct target of miR-96-5p. NFAT5 silencing had the same results with miR-96-5p overexpression on VSMC proliferation, migration, and apoptosis, whereas miR-96-5p inhibitor reversed these effects.</AbstractText>
                <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">Our findings concluded that miR-96-5p could regulate proliferation, migration, and apoptosis of VSMCs induced by Ang II via targeting NFAT5.</AbstractText>
                <CopyrightInformation>© 2020 S. Karger AG, Basel.</CopyrightInformation>
            </Abstract>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Tian</LastName>
                    <ForeName>Long</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Cai</LastName>
                    <ForeName>Dinghua</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Zhuang</LastName>
                    <ForeName>Derong</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Wenyuan</ForeName>
                    <Initials>W</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Xuan</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Bian</LastName>
                    <ForeName>Xiaoli</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Xu</LastName>
                    <ForeName>Rui</ForeName>
                    <Initials>R</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Nephrology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wu</LastName>
                    <ForeName>Guanji</ForeName>
                    <Initials>G</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Xi'an Central Hospital of Xi'an Jiaotong University, Xi'an, China, guanjiguanji22@163.com.</Affiliation>
                    </AffiliationInfo>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <clonelicationTypeList>
                <clonelicationType UI="D016428">Journal Article</clonelicationType>
            </clonelicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2020</Year>
                <Month>02</Month>
                <Day>11</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <Country>Switzerland</Country>
            <MedlineTA>J Vasc Res</MedlineTA>
            <NlmUniqueID>9206092</NlmUniqueID>
            <ISSNLinking>1018-1172</ISSNLinking>
        </MedlineJournalInfo>
        <CitationSubset>IM</CitationSubset>
        <KeywordList Owner="NOTNLM">
            <Keyword MajorTopicYN="N">Migration</Keyword>
            <Keyword MajorTopicYN="N">NFAT5</Keyword>
            <Keyword MajorTopicYN="N">Proliferation</Keyword>
            <Keyword MajorTopicYN="N">Vascular smooth muscle cell</Keyword>
            <Keyword MajorTopicYN="N">miR-96-5p</Keyword>
        </KeywordList>
    </MedlineCitation>
    <CardData>
        <History>
            <CardcloneDate cloneStatus="received">
                <Year>2019</Year>
                <Month>09</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="accepted">
                <Year>2019</Year>
                <Month>12</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="entrez">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="Card">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="medline">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
        </History>
        <clonelicationStatus>aheadofprint</clonelicationStatus>
        <ArticleIdList>
            <ArticleId IdType="Card">32045906</ArticleId>
            <ArticleId IdType="pii">000505457</ArticleId>
            <ArticleId IdType="doi">10.1159/000505457</ArticleId>
        </ArticleIdList>
    </CardData>
</Caplock>
</CaplockSet>'''

root = ET.fromstring(xml)
data = [{'Affiliation':a.find('AffiliationInfo/Affiliation').text,'ForeName': a.find('ForeName').text,'LastName': a.find('LastName').text} for a in root.findall('.//Author')]

相关问题 更多 >