处理XML标签并提取相应的标签内容

2024-09-28 18:57:07 发布

您现在位置:Python中文网/ 问答频道 /正文

处理后的XML文件的内容如下:

<dblp>
<incollection>                                                                                                                                                                                                                                                                                                                                                                                                                                            
<author>Philippe Balbiani</author>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
<author>Valentin Goranko</author>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
<author>Ruaan Kellerman</author>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
<author>Dimiter Vakarelov</author>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
<booktitle>Handbook of Spatial Logics</booktitle>                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
</incollection>
<incollection>                                                                                                                                                                                                                                                                                                                                                                                                                                   
<author>Jochen Renz</author>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
<author>Bernhard Nebel</author>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
<booktitle>Handbook of AI</booktitle>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
</incollection>
...
</dblp>

格式内容如上所示,提取“author”标记内容和“booktitle”标记内容,它们都位于“incollection”标记中,遍历每个“incollection”标记,并让多个author标记内容形成一个“booktitle”标记内容。对应关系

我的代码:

soup = BeautifulSoup(str(getfile()), 'lxml')
res = soup.find_all('incollection') 
list = []
list1=[]

for each in res:
    for child in each.children:
          if child.name == 'author':
                list.append(child.text)

          if child.name == 'booktitle':
                list1.append(child.text)           
                elem_dic = tuple(zip(list, list1))

我的结果是:

('Philippe Balbiani', 'Handbook of Spatial Logics')
('Valentin Goranko', 'Handbook of Spatial Logics')
('Ruaan Kellerman', 'Handbook of Spatial Logics')

理想的结果如下:

('Philippe Balbiani', 'Handbook of Spatial Logics')
('Valentin Goranko', 'Handbook of Spatial Logics')
('Ruaan Kellerman', 'Handbook of Spatial Logics')
('Dimiter Vakarelov', 'Handbook of Spatial Logics')
('Jochen Renz', 'Handbook of AI')
('Bernhard Nebel', 'Handbook of AI')

如何修改它以达到预期的结果


Tags: of标记child内容spatialauthorhandbookvalentin
1条回答
网友
1楼 · 发布于 2024-09-28 18:57:07

修改了下面给出的代码

soup = BeautifulSoup(str(getfile()), 'lxml')
res = soup.find_all('incollection') 
author = []
booktitle =[]

for each in res:
    for child in each.children:
          if child.name == 'author':
                author.append(child.text)
          elif child.name == 'booktitle': # either it will be 'author' or 'booktitle' so use 'elif'
                booktitle.append(child.text)           
elem_dic = tuple(zip(author, booktitle)) # No need to assign in every loop as you are already storing in lists

相关问题 更多 >