提取自定义XML标记

<item> <title>How Kerala is preparing for monsoon amid the COVID-19 pandemic</title> <link/>https://www.thenewsminute.com/article/how-kerala-preparing-monsoon-amid-covid-19-pandemic-125007 <description>Usually, Kerala begins its procedure for monsoon preparedness by January. This year, however, the officials got busy with preparing for a health crisis instead. “Kerala works six months and fights the monsoon in the other six months,” says Sekhar Kuriakose, member secretary of the Kerala State Disaster Management Authority (KSDMA). Usually, Kerala begins its monsoon preparedness by January, even before the India Meteorological Department (IMD) makes its first long-range forecast for southwe...</description> <pubdate>Thu, 21 May 2020 10:30:00 GMT</pubdate> <guid>https://www.thenewsminute.com/article/how-kerala-preparing-monsoon-amid-covid-19-pandemic-125007</guid> <media:content medium="image" url="https://www.thenewsminute.com/sites/default/files/Kerala-rain-trivandrum-1200.jpg" width="600"></media:content> </item>

1条回答

网友

1楼 · 发布于 2024-10-02 22:36:55

您的问题可能是BS4如何使用您正在使用的解析器后端处理名称空间。指定“LXML”而不是“XML”允许您使用find（）和find_all（），正如您在本例中所期望的那样

让t与您提供的XML一起成为字符串

soup = BeautifulSoup(t, "xml")
print(soup.find_all("media:content"))

产生

[]

但是，通过使用LXML解析器，它能够找到以下元素：

soup = BeautifulSoup(t, "lxml")
print(soup.find_all("media:content"))

产生

[<media:content medium="image" (...)></media:content>]

相关问题更多 >

编程相关推荐

热门问题

热门文章