我正试图用bs4来隔离“职业历史”——球员参加过的球队名单——NFL Qbs表的一部分:
我想要的输出是:
['St. Louis Rams (2005–2006)', 'Cincinnati Bengals (2007–2008)', 'Buffalo Bills (2009–2012)', 'Tennessee Titans (2013)', 'Houston Texans (2014)', 'New York Jets (2015–2016)', 'Tampa Bay Buccaneers (2017–2018)', 'Miami Dolphins (2019–present)']
我的代码是:
url = 'https://en.wikipedia.org/wiki/Ryan_Fitzpatrick'
table = BeautifulSoup(player_wiki.text , 'html.parser')
for tr in table.find('tbody').find_all('ul'):
v = [li.text for li in tr.find_all('li')]
print(v)
电流输出:
['St. Louis Rams (2005–2006)', 'Cincinnati Bengals (2007–2008)', 'Buffalo Bills (2009–2012)', 'Tennessee Titans (2013)', 'Houston Texans (2014)', 'New York Jets (2015–2016)', 'Tampa Bay Buccaneers (2017–2018)', 'Miami Dolphins (2019–present)']
['Ivy League Player of the Year (2004)', 'First-team All–Ivy League (2004)', 'George H. “Bulger” Lowe Award (2004)']
我肯定这是我的外环的“ul”标签。如何缩小find_all()的范围以防止不需要的数据?有什么建议吗?我是新的网页刮。你知道吗
您可以使用
soup.find_all
:输出:
方法1-使用requests和beautifulsoup4:
方法2-使用wikipedia模块:
输出:
相关问题 更多 >
编程相关推荐