Python:在XML文件中搜索特定数据

2024-10-04 03:28:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个很大的XML文件,结构如下:

  <processo numero="XXXXX" data-deposito="XXXXX">
    <despachos>
      <despacho codigo="XXXXX" nome="DATA TO GET"/>
    </despachos>
    <titulares>
      <titular nome-razao-social="XXXXX" pais="XX" uf="XX"/>
    </titulares>
    <marca apresentacao="XXXXX" natureza="XXXXX">
      <nome>NAME TO FIND</nome>
    </marca>
    <lista-classe-nice>
      <classe-nice codigo="XX">
        <especificacao>XXXXXXXXXX</especificacao>
        <status>XXXXX</status>
      </classe-nice>
    </lista-classe-nice>
  </processo>

我使用下面的python代码来搜索和打印特定的数据

from lxml import etree
 
with open("XML-FILE.xml",'rb') as f:
  file_content = f.read()
  tree = etree.fromstring(file_content)
# get all customer records
  customers = tree.xpath('//processo')
  for customer in customers:
      # note that xpath on text() returns a list
    despacho = customer.xpath('/despachos/despacho/text()')[0]
    nome = customer.xpath('/marca/nome/text()')[0]
    print(nome)
    print(despacho)

我试图在文件中搜索NAME TO FIND下有<marca的数据,然后再搜索内部的打印数据

    <despachos>
      <despacho codigo="XXXXX" nome="DATA TO GET"/>
    </despachos>

问题是我没有获取任何数据,有时获取indexer:list索引超出范围错误

感谢您的帮助


Tags: to数据textcustomerxpathcodigonicexxxxx
3条回答

请尝试以下XPath:

//processo[.//marca/nome[text()='NAME TO FIND']]//despacho/text

如果有多个文本匹配此XPath,则必须对其进行相应处理

此xpath应获取despacho元素中的属性

//despachos[following-sibling::marca/nome[text()="NAME TO FIND"]]/despacho/@*

测试

xmllint  xpath '//despachos[following-sibling::marca/nome[text()="NAME TO FIND"]]/despacho/@*' test.xml

结果:

codigo="XXXXX" nome="DATA TO GET"

仅获取@nome属性的de值

xmllint  xpath 'string(//despachos[following-sibling::marca/nome[text()="NAME TO FIND"]]/despacho/@nome)' test.xml ; echo

结果:

DATA TO GET

我向你推荐一个简单的图书馆。在使用之前,您需要安装:pip install-U simplified\U scrapy

from simplified_scrapy import SimplifiedDoc

doc = SimplifiedDoc()
doc.loadFile('XML-FILE.xml',lineByline=True)

customers = doc.getIterable('processo')
for customer in customers:
    despacho = customer.select('despachos>despacho>nome()')
    nome = customer.select('marca>nome>text()')
    print(despacho)
    print(nome)

结果:

DATA TO GET
NAME TO FIND

相关问题 更多 >