Python:在xml中,如何根据某些条件删除节点

2024-09-30 07:35:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个XML文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Reviews>
    <Review rid="1004293">
        <sentences>
            <sentence id="1004293:0">
                <text>Judging from previous posts this used to be a good place, but not any longer.</text>
                <Opinions>
            </sentence>
            <sentence id="1004293:1">
                <text>We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.</text>
                <Opinions>
            </sentence>
            <sentence id="1004293:2">
                <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
                <Opinions>
                    <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
                </Opinions>
            </sentence>
        </sentences>
    </Review>

如何删除那些没有意见的句子?把那些句子放在文本有意见的地方? 我想得到这样的东西:

<sentences>
        <sentence id="1004293:2">
            <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
            <Opinions>
                <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
            </Opinions>
        </sentence>
    </sentences>

Tags: andthetotextfromidonsentences
3条回答

使用内置XML库(ElementTree)。你知道吗

注意:您发布的XML无效,我必须修复它。你知道吗

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Reviews>
   <Review rid="1004293">
      <sentences>
         <sentence id="1004293:0">
            <text>Judging from previous posts this used to be a good place, but not any longer.</text>
            <Opinions />
         </sentence>
         <sentence id="1004293:1">
            <text>We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.</text>
            <Opinions />
         </sentence>
         <sentence id="1004293:2">
            <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
            <Opinions>
               <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0" />
            </Opinions>
         </sentence>
      </sentences>
   </Review>
</Reviews>
'''

root = ET.fromstring(xml)
sentences_root = root.find('.//sentences')
sentences_with_no_opinions = [s for s in root.findall('.//sentence') if not s.find('.//Opinions')]
for s in sentences_with_no_opinions:
    sentences_root.remove(s)


print(ET.tostring(root))

输出

<?xml version="1.0" encoding="UTF-8"?>
<Reviews>
   <Review rid="1004293">
      <sentences>
         <sentence id="1004293:2">
            <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
            <Opinions>
               <Opinion category="SERVICE#GENERAL" from="0" polarity="negative" target="NULL" to="0" />
            </Opinions>
         </sentence>
      </sentences>
   </Review>
</Reviews>

我将使用这个模块将xml转换为dict,例如:How to convert an xml string to a dictionary?,过滤掉不需要的节点并重新转换为xml。。。。你知道吗

考虑使用XSLT,这是一种专门用于转换XML文档的语言。具体来说,先运行identity转换,然后在语句上运行一个空模板,并带有所需的条件。你知道吗

XSLT(另存为.xsl文件,一个特殊的.xml文件)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!  IDENTITY TRANSFORM  >
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!  EMPTY TEMPLATE TO DELETE NODE(S)  >
    <xsl:template match="sentence[text and not(Opinions/*)]"/>

</xsl:stylesheet>

Online Demo

Python(使用第三方模块,lxml

import lxml.etree as et 

doc = et.parse('/path/to/Input.xml') 
xsl = et.parse('/path/to/Script.xsl') 

# CONFIGURE TRANSFORMER 
transform = et.XSLT(xsl) 

# TRANSFORM SOURCE DOC 
result = transform(doc) 

# OUTPUT TO CONSOLE 
print(result) 

# SAVE TO FILE 
with open('Output.xml', 'wb') as f: 
   f.write(result)

相关问题 更多 >

    热门问题