用Python访问元素解析XML

3条回答

网友

1楼 · 编辑于 2024-10-01 02:19:17

首先，请不要理会我的评论。结果表明，xml.etree比标准的xml.etree.ElementTree好得多，因为它处理名称空间。问题是要搜索'//Constant'，这意味着节点可以位于任何级别。但是，根元素不允许您这样做：

>>> root.findall('//Constant')
SyntaxError: cannot use absolute path on element

但是，您可以在更高级别执行此操作：

^{pr2}$

更新

我将全文张贴在这里。因为我没有完整的XML文件，所以我做了一些事情来填补空白。在

from lxml import etree as ET
from StringIO import StringIO

xml_text = """<?xml version='1.0' encoding='utf-8' ?>

<rdf:root  xmlns:rdf='http://foo.bar.com/rdf'>
<rdf:RDF>
  <rdf:Description>
    DescriptionX
  </rdf:Description>
</rdf:RDF>
<rdf:foo>
        <MiriamAnnotation>
          bar
        </MiriamAnnotation>
        <ListOfSubstrates>
          <Substrate metabolite="Metabolite_5" stoichiometry="1"/>
        </ListOfSubstrates>
        <ListOfModifiers>
          <Modifier metabolite="Metabolite_9" stoichiometry="1"/>
        </ListOfModifiers>
        <ListOfConstants>
          <Constant key="Parameter_4344" name="Kcat" value="433.724"/>
          <Constant key="Parameter_4343" name="km" value="479.617"/>
        </ListOfConstants>
</rdf:foo>
</rdf:root>
"""

buffer = StringIO(xml_text)
tree = ET.parse(buffer)
for constant_node in tree.findall('//Constant'):
    print constant_node.attrib['key']

网友

2楼 · 编辑于 2024-10-01 02:19:17

以下是如何获取要查找的值：

from lxml import etree

parsed = etree.parse('ct.cps')

for a in parsed.findall("//{http://www.copasi.org/static/schema}Constant"):
    print a.attrib["key"]

输出：

^{pr2}$

这里重要的是，XML文件中的COPASI根元素（Dropbox URL中的实际根元素）声明了一个默认名称空间（http://www.copasi.org/static/schema）。这意味着元素及其所有后代，包括Constant，都属于该名称空间。在

因此，您需要查找Constant元素，而不是Constant元素。在

见http://lxml.de/tutorial.html#namespaces。在

下面是如何使用XPath而不是findall：

from lxml import etree

NSMAP = {"c": "http://www.copasi.org/static/schema"}

parsed = etree.parse('ct.cps')

for a in parsed.xpath("//c:Constant", namespaces=NSMAP):
    print a.attrib["key"]

见http://lxml.de/xpathxslt.html#namespaces-and-prefixes。在

网友

3楼 · 编辑于 2024-10-01 02:19:17

不要使用findall。它的功能集有限，并且设计为与ElementTree兼容。在

相反，请使用支持名称空间的xpath。从上面看来，你可能想说

# possibilities, you need to get these right...
ns_dict = {'atom':"http://www.w3.org/2005/Atom",,
    "rdf":"http://www.w3.org/2000/01/rdf-schema#" }

root = parsed.getroot()    
for a in root.xpath('.//rdf:Constant', namespaces=ns_dict):
    print a.attrib['key']

请注意，每当元素具有非空命名空间时，必须在xpath表达式中包含名称空间前缀，并且它们必须映射到与文档中相同URL匹配的命名空间URL之一。在

更新

因为您发布了原始文档，所以我发现没有为您要查找的元素分配名称空间。这会有用的，我只是用你的源文档尝试过：

^{pr2}$

您不需要命名空间，因为文档本身没有指定默认命名空间。在

更新

更新

相关问题更多 >

编程相关推荐

热门问题

热门文章