如果patten在两个匹配项之间找到，则从一个匹配项提取到下一个匹配项

<Iteration> <Iteration_hit>Elememt1 Element1 abc1 hit 1 . . </Iteration> <Iteration> <Iteration_hit>Elememt2 Element2 abc2 hit 1 . . </Iteration> <Iteration> <Iteration_hit>Elememt3 Element3 abc3 hit 1 . . </Iteration> <Iteration> <Iteration_hit>Elememt4 Element4 abc4 hit 1 . . </Iteration>

#!/usr/bin/python x = raw_input("Enter your xml file name: ") xml = open(x) l = raw_input("Enter your list file name: ") lst = open(l) Id = list() ylist = list() import re for line in lst: stuff=line.rstrip() stuff.split() Id.append(stuff) for ele in Id: for line1 in xml: if line1.startswith(" <Iteration_hit>"): y = line1.split() # print y[1] if y[1] == ele: break

2条回答

网友

1楼 · 编辑于 2024-10-01 00:31:03

不建议使用regex来解析XML—您应该使用lxml这样的库，您可以使用pip install lxml来安装它。然后，您可以使用lxml和XPath选择要输出的适当元素，如下所示（我已经冒昧地关闭了XML中的<Iteration_hit>标记）：

content = '''
<root>
<Iteration>
  <Iteration_hit>Elememt1 Element1
    abc1 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt3 Element3
    abc3 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  </Iteration_hit>
</Iteration>
</root>
'''

from lxml import etree

tree = etree.XML(content)
target_elements = tree.xpath('//Iteration_hit[contains(., "Element2") or contains(., "Element4")]')

for element in target_elements:
    print(etree.tostring(element))

输出

<Iteration_hit>Elememt2 Element2
    abc2 hit 1
  </Iteration_hit>

<Iteration_hit>Elememt4 Element4
    abc4 hit 1
  </Iteration_hit>

网友

2楼 · 编辑于 2024-10-01 00:31:03

下面是通过Python解析xml所需的完整脚本

#!/usr/bin/python
from lxml import etree

with open('input.xml', 'r') as myfile:
    content=myfile.read().replace('\n', '\n')


lst = open('ID.list')
Id = list()
for line in lst:
    stuff=line.rstrip()
    stuff.split()
    Id.append(stuff)
for ele in Id:
    tree = etree.XML(content)
    target_elements = tree.xpath('//Iteration[contains(., ele)]')

for element in target_elements:
    print(etree.tostring(element))

相关问题更多 >

编程相关推荐

热门问题

热门文章