beauthulsoup XML Python从特定标记提取属性

2024-10-03 11:22:02 发布

您现在位置:Python中文网/ 问答频道 /正文

此XML文档包含一组标记events-data。我想从最近的events-data中提取信息。例如,在下面的代码中,我想转到lastevents-data标记,向下到event-date标记并提取date子标记的文本。目前,我正在Python中使用BeautifulSoup遍历这个文档。有什么想法吗?在

    <?xml version="1.0" encoding="UTF-8"?>
        <first-tag>
          <second-tag>
            <events-data>
               <event-date>
                    <date>20040913</date>
               </event-date>
            </events-data>

          <events-data> #the one i want to traverse to grab date text
             <event-date>
               <date>20040913</date>
             </event-date>
          </events-data> 
         </second-tag>
       </first-tag>

Tags: to代码文档标记文本event信息data
1条回答
网友
1楼 · 发布于 2024-10-03 11:22:02

这是在用BeautifulSoup3

import os
import sys

# Import Custom libraries
from BeautifulSoup import BeautifulStoneSoup

xml_str = \
'''
<?xml version="1.0" encoding="UTF-8"?>
    <first-tag>
      <second-tag>
        <events-data>
           <event-date>
                <date>20040913</date>
           </event-date>
        </events-data>

      <events-data>
         <event-date>
           <date>20040913</date>
         </event-date>
      </events-data> 
     </second-tag>
   </first-tag>
'''
soup = BeautifulStoneSoup(xml_str)

event_data_location = lambda x: x.name == "events-data"

events = soup.findAll(event_data_location)
if(events):
    # The last event-data
    print events[-1].text

相关问题 更多 >