我试图使用BeautifulSoup将HTML表转换为python dict。但是由于该表具有多个级别,因此无法正确保存信息
以下是您尝试过的内容:
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/title/tt8579674/awards'
response = requests.get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
award_list = []
for table in html_soup.find_all('table', {'class': 'awards'}):
for tr in table.find_all('tr'):
for title_award_outcome in tr.find_all('td', {'class': 'title_award_outcome'}):
award_name = title_award_outcome.get_text(separator='<br/>',
strip=True).split('<br/>', 1)[1]
for award_description in tr.find_all('td', {'class': 'award_description'}):
award_description = award_description.get_text(separator='<br/>',
strip=True).split('<br/>', 1)[0]
award = award_name+'_'+award_description
for title_award_outcome in tr.find_all('td', {'class': 'title_award_outcome'}):
result = title_award_outcome.get_text(separator='<br/>', strip=True).split('<br/>', 1)[0]
award_dict[award] = result
award_list.append(award_dict)
print(award_list)
这只返回第二列的第一个信息
预期结果:
[{'Golden Globe_Best Motion Picture - Drama': 'Winner',
'Golden Globe_Best Original Score - Motion Picture': 'Nominee',
'Golden Globe_Best Original Score - Motion Picture': 'Nominee',
'BAFTA Film Award_Best Director': 'Nominee',
'BAFTA Film Award_Outstanding British Film of the Year': 'Nominee',
etc, etc, etc}]
要创建所需词典,可以使用以下示例:
印刷品:
相关问题 更多 >
编程相关推荐