具有多个名称空间的python lxml findall

<MeasurementRecords xmlns="http://www.company.com/common/rsp/2012/07" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.company.com/common/rsp/2012/07 RSP_EWS_V1.6.xsd"> <HistoryRecords> <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId> <List> <HistoryRecord> <Value>60</Value> <State>Valid</State> <TimeStamp>2016-04-20T12:40:00Z</TimeStamp> </HistoryRecord> </List> </HistoryRecords> <HistoryRecords> </MeasurementRecords>

>>> root.nsmap {'xsi': 'http://www.w3.org/2001/XMLSchema-instance', None: http://www.company.com/common/rsp/2012/07'} >>> nsmap['foo']=nsmap[None] >>> nsmap.pop(None) 'http://www.company.com/common/rsp/2012/07' >>> nsmap {'xsi': 'http://www.w3.org/2001/XMLSchema-instance', 'foo': 'http://www.company.com/common/rsp/2012/07'} >>> tree.xpath("//MeasurementRecords", namespaces=nsmap) [] >>> tree.xpath('/foo:MeasurementRecords', namespaces=nsmap) [<Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290>] >>> tree.xpath('/foo:MeasurementRecords/HistoryRecords', namespaces=nsmap) []

>>> tree.findall('//{http://www.company.com/common/rsp/2012/07}MeasurementRecords') [] >>> print root <Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290> >>> print tree <lxml.etree._ElementTree object at 0x6ffffda5368> >>> for node in tree.iter(): ... print node ... <Element {http://www.company.com/common/rsp/2012/07}MeasurementRecords at 0x6ffffda5290> <Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x6ffffda5cf8> <Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x6ffffda5f38> ...etc... >>> tree.findall("//HistoryRecords", namespaces=nsmap) [] >>> tree.findall("//foo:MeasurementRecords/HistoryRecords", namespaces=nsmap) []

1条回答

网友

1楼 · 发布于 2024-04-28 18:45:50

如果你从这个开始：

>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>>

这将无法找到任何元素。。。在

^{pr2}$

…但这是因为root是一个MeasurementRecords元素；它不包含任何MeasurementRecords元素。另一方面 hand，下面的方法很好：

>>> root.findall('{http://www.company.com/common/rsp/2012/07}HistoryRecords')
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]
>>>

使用xpath方法，可以执行以下操作：

>>> nsmap={'a': 'http://www.company.com/common/rsp/2012/07',
... 'b': 'http://www.w3.org/2001/XMLSchema-instance'}
>>> root.xpath('//a:HistoryRecords', namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}HistoryRecords at 0x7fccd0332ef0>]

所以：

findall和find方法需要{...namespace...}ElementName语法。在
xpath方法需要名称空间前缀（ns:ElementName），它在提供的namespaces映射中查找这些前缀。前缀不必与原始文档中使用的前缀匹配，但命名空间url必须匹配。在

所以这是可行的：

>>> root.find('{http://www.company.com/common/rsp/2012/07}HistoryRecords/{http://www.company.com/common/rsp/2012/07}ValueItemId')
<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0332a70>

或者这个工程：

>>> root.xpath('/a:MeasurementRecords/a:HistoryRecords/a:ValueItemId',namespaces=nsmap)
[<Element {http://www.company.com/common/rsp/2012/07}ValueItemId at 0x7fccd0330830>]

相关问题更多 >

编程相关推荐

热门问题

热门文章