使用feedparser访问重复的提要标记

<podcast:person role="host" img="https://dudesanddadspodcast.com/files/2019/03/andy.jpg" href="https://www.podchaser.com/creators/andy-lehman-107aRuVQLA">Andy Lehman</podcast:person> <podcast:person role="host" img="https://dudesanddadspodcast.com/files/2019/03/joel.jpg" href="https://www.podchaser.com/creators/joel-demott-107aRuVQLH" >Joel DeMott</podcast:person>

> import feedparser > d = feedparser.parse('https://feeds.podcastmirror.com/dudesanddadspodcast') > d.feed['podcast_person'] > {'role': 'host', 'img': 'https://dudesanddadspodcast.com/files/2019/03/joel.jpg', 'href': 'https://www.podchaser.com/creators/joel-demott-107aRuVQLH'}

3条回答

网友

1楼 · 编辑于 2024-10-01 00:26:13

与feedparser相比，我更喜欢BeautifulSoup

您可以复制以下代码来测试最终结果

from bs4 import BeautifulSoup
import requests

r = requests.get("https://feeds.podcastmirror.com/dudesanddadspodcast").content
soup = BeautifulSoup(r, 'html.parser')

feeds = soup.find_all("podcast:person")

print(type(feeds))  # <list>

# You can loop the `feeds` variable.

网友

2楼 · 编辑于 2024-10-01 00:26:13

想法#1:

from bs4 import BeautifulSoup
import requests

r = requests.get("https://feeds.podcastmirror.com/dudesanddadspodcast").content
soup = BeautifulSoup(r, 'html.parser')

soup.find_all("podcast:person")

输出：

[<podcast:person href="https://www.podchaser.com/creators/andy-lehman-107aRuVQLA" img="https://dudesanddadspodcast.com/files/2019/03/andy.jpg" role="host">Andy Lehman</podcast:person>,
 <podcast:person href="https://www.podchaser.com/creators/joel-demott-107aRuVQLH" img="https://dudesanddadspodcast.com/files/2019/03/joel.jpg" role="host">Joel DeMott</podcast:person>,
 <podcast:person href="https://www.podchaser.com/creators/cory-martin-107aRwmCuu" img="" role="guest">Cory Martin</podcast:person>,
 <podcast:person href="https://www.podchaser.com/creators/julie-lehman-107aRuVQPL" img="" role="guest">Julie Lehman</podcast:person>]

创意#2:

导入feedparser

d = feedparser.parse('https://feeds.podcastmirror.com/dudesanddadspodcast')
hosts = d.entries[1]['authors'][1]['name'].split(", ")

print("The hosts of this Podcast are {} and {}.".format(hosts[0], hosts[1]))

输出：

The hosts of this Podcast are Joel DeMott and Andy Lehman.

网友

3楼 · 编辑于 2024-10-01 00:26:13

您可以迭代feed['items']并获取所有记录

import feedparser

feed = feedparser.parse('https://feeds.podcastmirror.com/dudesanddadspodcast')

if feed:
    for item in feed['items']:
        print(f'{item["title"]} - {item["author"]}')

相关问题更多 >

编程相关推荐

热门问题

热门文章