根据子元素Python的条件删除XML父元素

2024-05-19 10:23:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图根据包含“nan”值的特定子元素的文本删除父XML元素。输入XML包含名称空间,这比预期的要复杂,我可以单独删除select子元素,但不能删除关联/相邻的父元素。我只能删除与gam:String元素关联的“nan”值,但我希望删除所有具有“nan”文本值的子元素及其关联的父元素

下面是我正在使用的脚本,以及输入和(期望的)输出XML…非常感谢您的帮助

剧本:

from lxml import etree
import os 

path = "C:\\users\\mdl518\\Desktop\\"

### Removing "Nan" Values
tree = etree.parse(os.path.join(path,"metadata_info.xml"))

for elem in tree_2.findall('.//{http://standards.iso.org/iso/19115/-3/gam/1.0}String'):
   if elem.text=='nan':
     parent = elem.getparent()
     parent.remove(elem)
    
with open(".//metadata_output.xml","wb") as f:
    f.write(etree.tostring(tree_2, xml_declaration=True, encoding='utf-8')) ## Removes elements with "nan" values

输入XML:

<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"   
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"    
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0" 
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
  <mdl:metadataIdentifier>
    <mcc:MD_Identifier>
      <mnl:name>
        <mnl:type>
          <gam:String>The Metadata File</gam:String>
        </mnl:type>
        <mnl:description>
          <mcc:listing codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"</mcc:listing>
        </mnl:description>
      </mnl:name>
      <mnl:address>
        <mnl:defaultLocale>
          <lan:location>nan</lan:location>
        </mnl:defaultLocale>
      </mnl:address>
      <lan:language>
        <lan:type>
          <lis:name>English</lis:name>
        </lan:type>
       </lan:language>
     </mcc:MD_Identifier>
     <mcc:contactInfo>
       <mdl:POC>
         <mnl:name>
           <lis:person>Tom</lis:person>
         </mnl:name>
         <mnl:age>
           <gam:String>nan</gam:String>
         </mnl:age>
         <mnl:status>
           <lis:employment>nan</lis:employment>
         </mnl:status>
       </mdl:POC>
     </mcc:contactInfo>
   </mdl:metadataIdentifier>
 </nas:metadata>

输出XML:

<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"   
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"    
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0" 
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">
  <mdl:metadataIdentifier>
    <mcc:MD_Identifier>
      <mnl:name>
        <mnl:type>
          <gam:String>The Metadata File</gam:String>
        </mnl:type>
        <mnl:description>
          <mcc:listing codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"</mcc:listing>
        </mnl:description>
      </mnl:name>
      <lan:language>
        <lan:type>
          <lis:name>English</lis:name>
        </lan:type>
       </lan:language>
     </mcc:MD_Identifier>
     <mcc:contactInfo>
       <mdl:POC>
         <mnl:name>
           <lis:person>Tom</lis:person>
         </mnl:name>
       </mdl:POC>
     </mcc:contactInfo>
   </mdl:metadataIdentifier>
 </nas:metadata>

Tags: nameorghttpstringisonanstandardsnas
1条回答
网友
1楼 · 发布于 2024-05-19 10:23:49

这必须分两个阶段完成:首先删除所有带有nan文本节点的节点,然后检查第一步创建的空节点并将其删除:

#step 1 - remove nan nodes
for n in tree.xpath('//*[.="nan"]'):
    n.getparent().remove(n)]

#step 2 - select empty nodes and remove them as well
empty = [e for e in doc.xpath('//*[not(normalize-space())]')]

for emp in empty:
    try:
        emp.getparent().remove(emp)
    #one nested empty node is created by the first step; this step removes both nodes so try/except is necessary:
    except:
        continue
print(etree.tostring(doc).decode())

这将使您获得所需的输出

相关问题 更多 >

    热门问题