如果patten在两个匹配项之间找到,则从一个匹配项提取到下一个匹配项

2024-10-01 00:31:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python的初学者。我正在努力解决一个问题,下面将对此进行解释。我分享不完整的python脚本也不适合这个问题。如果我的剧本能得到支持或指导,我将不胜感激

文件如下所示:

<Iteration>
  <Iteration_hit>Elememt1 Element1
    abc1 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt3 Element3
    abc3 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  .
  .
</Iteration>

对于元素列表匹配,我需要从<Iteration></Iteration>,这意味着对于Element2和Element4,输出文件应该如下所示:

<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  .
  .
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  .
  .
</Iteration>

脚本

#!/usr/bin/python
x = raw_input("Enter your xml file name: ")
xml = open(x)
l = raw_input("Enter your list file name: ")
lst = open(l)
Id = list()
ylist = list()
import re
for line in lst:
        stuff=line.rstrip()
        stuff.split()
        Id.append(stuff)
for ele in Id:
        for line1 in xml:
                if line1.startswith("  <Iteration_hit>"):
                        y = line1.split()
#                       print y[1]
                        if y[1] == ele: break

Tags: 文件in脚本idforxmlliststuff
2条回答

不建议使用regex来解析XML—您应该使用lxml这样的库,您可以使用pip install lxml来安装它。然后,您可以使用lxmlXPath选择要输出的适当元素,如下所示(我已经冒昧地关闭了XML中的<Iteration_hit>标记):

content = '''
<root>
<Iteration>
  <Iteration_hit>Elememt1 Element1
    abc1 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt2 Element2
    abc2 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt3 Element3
    abc3 hit 1
  </Iteration_hit>
</Iteration>
<Iteration>
  <Iteration_hit>Elememt4 Element4
    abc4 hit 1
  </Iteration_hit>
</Iteration>
</root>
'''

from lxml import etree

tree = etree.XML(content)
target_elements = tree.xpath('//Iteration_hit[contains(., "Element2") or contains(., "Element4")]')

for element in target_elements:
    print(etree.tostring(element))

输出

<Iteration_hit>Elememt2 Element2
    abc2 hit 1
  </Iteration_hit>

<Iteration_hit>Elememt4 Element4
    abc4 hit 1
  </Iteration_hit>

下面是通过Python解析xml所需的完整脚本

#!/usr/bin/python
from lxml import etree

with open('input.xml', 'r') as myfile:
    content=myfile.read().replace('\n', '\n')


lst = open('ID.list')
Id = list()
for line in lst:
    stuff=line.rstrip()
    stuff.split()
    Id.append(stuff)
for ele in Id:
    tree = etree.XML(content)
    target_elements = tree.xpath('//Iteration[contains(., ele)]')

for element in target_elements:
    print(etree.tostring(element))

相关问题 更多 >