使用ElementTree，lxml是findall唯一有效的（.//）通配符，不能使用相对路径吗？

<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE PPP SYSTEM 'PPP.DTD'> <book chg="R" model="AB" > <chapter chapnbr="09" chg="U" key="EN49" > <effect effrg="Afcd"/> <title>HOW TO WIN</title> <section chapnbr="09" chg="U" key="Edff" revdate="20100701" sectnbr="102"> <title>What a start</title> <subject chapnbr="09" chg="U" key="Edff" revdate="20100701" sectnbr="102" subjnbr="00"> <title>1.A</title> <pgblk chapnbr="09" chg="U" confnbr="00" key="Edff00" pgblknbr="00" revdate="20200701" sectnbr="102" subjnbr="00"> <effect effrg="12"/> <title>1.A.i) Plan Ahead for the worst</title> <prclist1> <prcitem1 adns-numbering="8" adns-title="learning my way with help of good people" > <effect effrg="Edff"/> <prcitem asFragment="true"> <title>1.A.i) Plan Ahead for the worst</title> <para>It was a cold January night, and I had too much whisky. <refblock> 09-102-00 <refint rrr="22,445,555,555,555" refid="Edff0898"> <effect effrg="Edff0899"/> 0910200</refint> </refblock>. </para> <para>In more usual circumstances, I possesed the self-control. Not this time <refblock> 09-102-00-1111 <refint rrr="sdf,2323,2323" refid="Edff123"> <effect effrg="Edff12434"/> 09-102-00</refint> </refblock>. </para> </prcitem> </prcitem1> </prclist1> </pgblk> </subject> </section> </chapter> </book>

1条回答

网友

1楼 · 发布于 2024-10-01 13:24:03

findall()不接受绝对路径名。您需要相对路径名

'.//section/title'确实有效，但它返回title标记。因此，无论有多少个匹配项，您都会在dict中得到一个名为title的单键，why可能不是您想要的

如果要使用标题作为章节索引，可以执行以下操作：

d = dict((item.text, item.getparent()) for item in root.findall('.//section/title'))

从示例XML中，这将创建一个dict，其中keyWhat a start和chapter元素作为值

如果您想充分利用XPath表达式的功能，我建议您使用XPathEvaluator：

from lxml import etree

tree = etree.parse('file.xml')

xev = etree.XPathEvaluator(tree)

d = dict((item.text, item.getparent()) for item in xev('/book/chapter/section/title'))

for k, v in d.items():
    print(f"{k} -> {v.tag}")

输出：

What a start -> section

相关问题更多 >

编程相关推荐

热门问题

热门文章