使用元素树删除xml节点的所有内容和子元素

2024-06-28 20:09:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个XML文件,希望删除节点中具有给定属性=值的所有内容,但无法使元素树.remove()方法正常工作。我得到一个list.remove(x): x not in list错误

如果我有一个div,包含多个段落和列表元素,具有属性v1-9,deleted,我希望能够删除整个div及其所有内容

import xml.etree.ElementTree as ET
#get target file
tree = ET.parse('tested.htm')
#pull into element tree
root = tree.getroot()
#confirm output
print(root)
#define xlmns tags
MadCap = {'MadCap': 'http://www.madcapsoftware.com/Schemas/MadCap.xsd'}

i=1
j=6

# specify state
            state = "state.deleted-in-vers"
            # specify version
            vers = "version-number.v{}-{}".format(i,j)
            # combine to get conditional string might need to double up b/c of order mattering here???
            search = ".//*[@MadCap:conditions='{},{}']".format(vers,state)
            #get matching elements
            for elem in root.findall(search, MadCap):
                print('---PARENT---')
                print(elem)
                print('attributes:', elem.attrib)
                print('text:', elem.text)
                elem.text = " "
                elem.attrib = {}
                for child in elem.iter():
                    print('-child element-')
                    print(child)
                    elem.remove(child)
            print('==========')

为了简单起见,我省略了上面I和j上的循环

下面是目标xml的一个片段,您可以看到这些属性是如何使用的

<div MadCap:conditions="state.deleted-in-vers,version-number.v1-9"> 
                              <h4>Example with password prompts</h4> 
                              <p>In the following example:</p> 
                              <ul> 
                                  <li>We have included the value <code>connection.ask-pass</code>, so are being prompted for the password of the setup user. </li> 
                                  <li>This host has an installation user <code>hub-setup</code>. </li> 
                                  <li>We are installing to the host <code>hub.example.com</code>. We must provide the FQDN of the host.</li> 
                                  <li>The KeyStore we are installing to the <MadCap:variable name="Components/gateway-hub.gateway-hub-name" /> hosts is located at <code>/tmp/ssl_keystore</code> on the installation machine.</li> 
                                  <li>The TrustStore we are installing to the <MadCap:variable name="Components/gateway-hub.gateway-hub-name" /> hosts is located at <code>/tmp/ssl_truststore</code> on the installation machine.</li> 
                                  <li>We are not providing any of the password key-value pairs, and therefore are being prompted for the passwords. </li> 
                                  <li>This host has a runtime user <code>hub</code>.<ul><li>The runtime user is in group <code>gateway-hub</code>.</li></ul></li> 
                              </ul> 
                              <p>The <MadCap:variable name="3rd-party-products/formats.json-name" /> configuration file is the following:</p><pre xml:space="preserve">{ 
      "connection": { 
          "ask_pass": true, 
          "user": "hub-setup" 
      }, 
      "hosts": ["hub.example.com"], 
      "hub": {<MadCap:conditionalText MadCap:conditions="state.new-in-vers,version-number.v1-6"> 
          "user" : "hub", 
          "group" : "gateway-hub",</MadCap:conditionalText> 
          "ssl": { 
              "key_store": "/tmp/ssl_keystore", 
              "trust_store": "/tmp/ssl_truststore" 
          } 
      }<MadCap:conditionalText MadCap:conditions="version-number.v1-6,state.deleted-in-vers"> 
      "ansible" : {  
          "variables" : {  
              "hub_user": "hub",  
              "hub_group": "gateway-hub" 
          }  
      }</MadCap:conditionalText> 
  }</pre> 
                          </div> 
                          <div MadCap:conditions="state.deleted-in-vers,version-number.v1-9"> 
                              <h4>Example using SSH key</h4> 
                              <p>In the next example:</p> 
                              <ul> 
                                  <li>The SSH key for the setup user is located at <code>~/.ssh/HUB-SETUP-KEY.pem</code> on the installation machine, specified with <code>connection.private_key</code>. </li> 
                                  <li>The hosts have an installation user <code>hub-setup</code>. We must provide the FQDN of the host.</li> 
                                  <li>The hosts are specified in a list in a newline-delimited file at <code>/tmp/hosts</code> on the installation machine. </li> 
                                  <li>The KeyStore we are installing to the <MadCap:variable name="Components/gateway-hub.gateway-hub-name" /> hosts is located at <code>/tmp/ssl_keystore</code> on the installation machine.</li> 
                                  <li>The TrustStore we are installing to the <MadCap:variable name="Components/gateway-hub.gateway-hub-name" /> hosts is located at <code>/tmp/ssl_truststore</code> on the installation machine.</li> 
                                  <li>We are providing the passwords.</li> 
                                  <li>There is a runtime user on every host called <code>hub</code>.<ul><li>The runtime user is in group <code>gateway-hub</code>.</li></ul></li> 
                              </ul> 
                              <p>The <MadCap:variable name="3rd-party-products/formats.json-name" /> configuration file is the following:</p><pre xml:space="preserve">{ 
      "connection": { 
          "private_key": "~/.ssh/HUB-SETUP-KEY.pem", 
          "user": "hub-setup" 
      }, 
      "hosts_file": "/tmp/hosts", 
      "hub": {<MadCap:conditionalText MadCap:conditions="state.new-in-vers,version-number.v1-6"> 
          "user" : "hub", 
          "group" : "gateway-hub",</MadCap:conditionalText> 
          "ssl": { 
              "key_store": "/tmp/ssl_keystore", 
                      "key_store_password" "hub123",  
              "trust_store": "/tmp/ssl_truststore", 
              "trust_store_password": "hub123", 
              "key_password": "hub123" 
          } 
      }<MadCap:conditionalText MadCap:conditions="version-number.v1-6,state.deleted-in-vers"> 
      "ansible" : {  
          "variables" : {  
              "hub_user": "hub",  
              "hub_group": "gateway-hub"  
          }  
      }</MadCap:conditionalText> 
  }</pre> 
                          </div>

Tags: thenameiniscodelitmpare
1条回答
网友
1楼 · 发布于 2024-06-28 20:09:20

我发现使用lxml更容易完成任务,因为删除元素更容易

请尝试以下代码:

from lxml import etree as et

def remove_element(el):
    parent = el.getparent()
    if el.tail.strip():
        prev = el.getprevious()
        if prev is not None:
            prev.tail = (prev.tail or '') + el.tail
        else:
            parent.text = (parent.text or '') + el.tail
    parent.remove(el)

# Read source XML
parser = et.XMLParser(remove_blank_text=True)
tree = et.parse('Input.xml', parser)
root = tree.getroot()
# Replace the below namespace with your proper one
ns = {'mc': 'http://dummy.com'}
# Processing
for it in root.findall('.//*[@mc:conditions]', ns):
    attr = it.attrib
    attrTxt = ', '.join([ f'{key}: {value}'
        for key, value in attr.items() ])
    print(f'Elem.: {et.QName(it).localname:6}: {attrTxt}')
    delFlag = False
    cond = attr.get('{http://dummy.com}conditions')
    if cond:
        dct = { k: v for k, v in (x.split('.')
            for x in cond.split(',')) }
        vn = dct.get('version-number')
        st = dct.get('state')
        if vn == 'v1-6' and st.startswith('deleted'):
            delFlag = True
        print(f"    {vn}, {st:15}  {'Delete' if delFlag else 'Keep'}")
        if delFlag:
            remove_element(it)
# Print the result
print(et.tostring(tree, method='xml',
    encoding='unicode', pretty_print=True))

当然,在目标版本中,添加将此树保存到 输出文件

要使用单个根元素正确格式化XML, 我将您的内容封装在:

<main xmlns:MadCap="http://dummy.com">
   ...
</main>

编辑

在我以前的解决方案中,我使用it.getparent().remove(it)删除 有问题的因素。 但后来我发现了一个缺陷,如果源代码 XML包含“混合内容”,即删除元素后的“尾部”文本也被删除(但不应删除)

为了防止它,我添加了remove\u元素函数,以仅删除 元素本身并调用它,而不是以前的it.getparent().remove(it)

在评论中一个问题之后的解释

attrTxt的源是attr字典(当前元素的属性)的内容。 这个片段实际上是在没有大括号的情况下打印这本词典的。 它仅用于跟踪,无需进一步使用

另一方面,dct起着更重要的作用。 它的来源是cond,包含条件属性的内容 当前元素),例如状态。在版本号为v1-6的版本中新增

这段代码:

  • 以逗号分隔内容
  • 将上面的每个部分拆分为一个点
  • 从这些对创建字典

然后,vn接收版本号(v1-6)和st-状态 (新版本)。 这是一个实质性的情报嵌入在这里。 因为这两个片段可以以不同的顺序出现,所以不能创建 任何匹配所有可能情况的XPath表达式。 但是如果你检查上述变量,很明显 元件是否应为可拆卸元件

相关问题 更多 >