只返回第一个标签

2024-09-28 03:23:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我最近一直在和美丽集团合作。我试图从https://www.pro-football-reference.com/teams/mia/2000_roster.htm站点获取数据。特别是我想要的是球员的名字和'gs'(游戏开始)。在

但是,在执行此操作时,它只返回第一个('Starters')表数据。实际上我对最上面的那张桌子一点也不感兴趣,我想要第二张名为“花名册”的桌子。在

这是我正在做的代码。就像我说的,我其实不想/需要任何东西除了球员名字和游戏开始,但只是练习和学习美丽小组。在

import pandas as pd
import requests
import bs4

alpha  = requests.get('https://www.pro-football-
reference.com/teams/mia/2000_roster.htm')

beta = bs4.BeautifulSoup(alpha.text,'lxml')


gama = beta.findAll('th',{'data-stat':'pos'})
position = [th.text for th in gama]
position = position[1:]
position = list(filter(None, position))

gama = beta.findAll('td',{'data-stat':'player'})
player = [td.text for td in gama]
player = player[1:]
while 'Defensive Starters' in player: player.remove('Defensive Starters')
while 'Special Teams Starters' in player: player.remove('Special Teams 
Starters')

gama = beta.findAll('td',{'data-stat':'age'})
age = [td.text for td in gama]
age = list(filter(None, age))

gama = beta.findAll('td',{'data-stat':'gs'})
gs = [td.text for td in gama]
gs = list(filter(None, gs))

target = pd.DataFrame(

{
'player_name':player,
'position':position,
'gs':gs,
'age':age
})

有人看到我哪里出错了吗?或者是另一种方法?在


Tags: textinimportgsforagedataposition
1条回答
网友
1楼 · 发布于 2024-09-28 03:23:52

要从该表获取内容,您需要使用任何浏览器模拟器,因为该部分的响应是动态生成的。不过,第一个表中的数据可以在没有任何浏览器模拟器的情况下轻松访问。我在这个案例中尝试过硒:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
    player = items.select("[data-stat='player']")[0].text
    gs = items.select("[data-stat='gs']")[0].text
    print(player,gs)

driver.quit()

部分输出:

^{pr2}$

由于某些原因,如果您遇到这样的错误,这次也不会有该错误的选项:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
page_url = "https://www.pro-football-reference.com/teams/mia/2000_roster.htm"
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
table = soup.select(".table_outer_container")[1]
for items in table.select("tr"):
    player = items.select("[data-stat='player']")[0].text if items.select("[data-stat='player']") else ""
    gs = items.select("[data-stat='gs']")[0].text if items.select("[data-stat='gs']") else ""
    print(player,gs)

driver.quit()

相关问题 更多 >

    热门问题