是否删除子标记,但将文本保留为xml格式?

2024-10-06 07:04:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个xml,看起来像这样

<?xml version='1.0' encoding='utf8'?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation><x> </x><label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg><x>, </x><affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
<affiliation><x> </x><label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg><x>, </x><affnorg>College of Optoelectronic Engineering</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
</all>

任务是,我必须删除所有的<x>标记,并仅在affiliation标记中保留它们的文本,使用ElementTree我可以删除标记,但它也会删除文本,但我希望该文本位于父标记中,因此我的新xml如下所示

<?xml version='1.0' encoding='utf8'?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation> <label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg>, <affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
<affiliation> <label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg>, <affnorg>College of Optoelectronic Engineering</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
</all>

Tags: andofidxmlalllabelengineeringaffiliation
1条回答
网友
1楼 · 发布于 2024-10-06 07:04:18

使用BeautifulSoup可以使用unwrap()函数:

data = '''<?xml version='1.0' encoding='utf8'?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation><x> </x><label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg><x>, </x><affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
<affiliation><x> </x><label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg><x>, </x><affnorg>College of Optoelectronic Engineering</affnorg><x>, </x><affnorg>Shenzhen University</affnorg><x>, </x><affnadd>3688 Nanhai Ave</affnadd><x>, </x><affncity>Shenzhen</affncity><x>, </x><affnpost>518060</affnpost><x>, </x><affncountry>PR China</affncountry><x>.</x></affiliation>
</all>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data,'xml')

for x in soup.select('affiliation x'):
    x.unwrap()

print(soup)

印刷品:

<?xml version="1.0" encoding="utf-8"?>
<all>
<articletitle>text1<x> </x></articletitle>
<affiliation> <label id="aff1">12</label><affnorg>College of Materials Science and Engineering</affnorg>, <affnorg>Guangdong Research Center for Interfacial Engineering of Functional Materials</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
<affiliation> <label id="aff2">2</label><affnorg>Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province</affnorg>, <affnorg>College of Optoelectronic Engineering</affnorg>, <affnorg>Shenzhen University</affnorg>, <affnadd>3688 Nanhai Ave</affnadd>, <affncity>Shenzhen</affncity>, <affnpost>518060</affnpost>, <affncountry>PR China</affncountry>.</affiliation>
</all>

相关问题 更多 >