Python中的IMDB Web抓取

2024-10-01 17:37:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我不熟悉python中的web抓取。我正在使用代码

输入:first.find('p',{'class':''})

输出:

    Directors:
<a href="/name/nm0751577/">Anthony Russo</a>, 
<a href="/name/nm0751648/">Joe Russo</a>
<span class="ghost">|</span> 
    Stars:
<a href="/name/nm0000375/">Robert Downey Jr.</a>, 
<a href="/name/nm0262635/">Chris Evans</a>, 
<a href="/name/nm0749263/">Mark Ruffalo</a>, 
<a href="/name/nm1165110/">Chris Hemsworth</a>
</p>

问题: 我想从上面的输出中分离出控制器和星星,只需要字符串值


Tags: 代码namewebfindclasschrisanthonyfirst
1条回答
网友
1楼 · 发布于 2024-10-01 17:37:12

您可以使用此示例作为解析控制器和星星的开始:

from bs4 import BeautifulSoup


txt = '''    Directors:
<a href="/name/nm0751577/">Anthony Russo</a>,
<a href="/name/nm0751648/">Joe Russo</a>
<span class="ghost">|</span>
    Stars:
<a href="/name/nm0000375/">Robert Downey Jr.</a>,
<a href="/name/nm0262635/">Chris Evans</a>,
<a href="/name/nm0749263/">Mark Ruffalo</a>,
<a href="/name/nm1165110/">Chris Hemsworth</a>
</p>
'''

soup = BeautifulSoup(txt, 'html.parser')

directors, stars = [], []
for a in soup.select('a'):
    prev = a.find_previous_sibling(text=lambda t: ':' in t)
    if 'Directors' in prev:
        directors.append(a.text)
    else:
        stars.append(a.text)

print('Directors:', directors)
print('Stars:', stars)

印刷品:

Directors: ['Anthony Russo', 'Joe Russo']
Stars: ['Robert Downey Jr.', 'Chris Evans', 'Mark Ruffalo', 'Chris Hemsworth']

相关问题 更多 >

    热门问题