带有LXML Elemen的XPath

2024-09-29 06:30:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用lxml etree解析XML文档。我正在解析的XML文档如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
    <codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
        <docDscr>
            <citation>
                <titlStmt>
                    <titl>Test Title</titl>
                </titlStmt>
                <prodStmt>
                    <prodDate/>
                </prodStmt>
            </citation>
        </docDscr>
        <stdyDscr>
            <citation>
                <titlStmt>
                    <titl>Test Title 2</titl>
                    <IDNo agency="UKDA">101</IDNo>
                </titlStmt>
                <rspStmt>
                    <AuthEnty>TestAuthEntry</AuthEnty>
                </rspStmt>
                <prodStmt>
                    <copyright>Yes</copyright>
                </prodStmt>
                <distStmt/>
                <verStmt>
                    <version date="">1</version>
                </verStmt>
            </citation>
            <stdyInfo>
                <subject>
                    <keyword>2009</keyword>
                    <keyword>2010</keyword>
                    <topcClas>CLASS</topcClas>
                    <topcClas>ffdsf</topcClas>
                </subject>
                <abstract>This is an abstract piece of text.</abstract>
                <sumDscr>
                    <timePrd event="single">2020</timePrd>
                    <nation>UK</nation>
                    <anlyUnit>Test</anlyUnit>
                    <universe>test</universe>
                    <universe>hello</universe>
                    <dataKind>fdsfdsf</dataKind>
                </sumDscr>
            </stdyInfo>
            <method>
                <dataColl>
                    <timeMeth>test timemeth</timeMeth>
                    <dataCollector>test data collector</dataCollector>
                    <sampProc>test sampprocess</sampProc>
                    <deviat>test deviat</deviat>
                    <collMode>test collMode</collMode>
                    <sources/>
                </dataColl>
            </method>
            <dataAccs>
                <setAvail>
                    <accsPlac>Test accsPlac</accsPlac>
                </setAvail>
                <useStmt>
                    <restrctn>NONE</restrctn>
                </useStmt>
            </dataAccs>
            <othrStdyMat>
                <relPubl>122</relPubl>
                <relPubl>12332</relPubl>
            </othrStdyMat>
        </stdyDscr>
    </codeBook>
</metadata>

我编写了以下代码来尝试处理它:

^{pr2}$

根据我对lxml xpath docs的理解,我应该能够从特定元素中获得文本,如下所示:

xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')

但是,当我运行这个函数时,它返回一个空数组。在

我唯一可以返回的xpath是使用通配符:

xml_doc.xpath('*')

返回[<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>]。在

我已经看完了文件,我不明白这是怎么回事。感谢任何帮助。在


Tags: orgtesthttpversionwwwkeywordcitationxmlns