BeautifulSoup在第一页之后不工作

2024-06-13 20:03:07 发布

2000

男 | 程序猿一只，喜欢编程写python代码。

我试图使用Python的BeautifulSoup从以下website中提取数据。网站上的数据分为四个不同的页面。每个页面都有一个唯一的链接（，即第一页为http://insider.espn.com/nbadraft/results/top100/_/year/2019/set/0，第二页为http://insider.espn.com/nbadraft/results/top100/_/year/2019/set/1，等等）。我能够成功地刮取第一页上的数据，但当我试图刮取第二页上的数据时，它会变成空的。以下是我正在使用的代码：

# Import libraries
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import pandas as pd

# Define url and request webpage
season = 2019
page = 1
url = "http://insider.espn.com/nbadraft/results/top100/_/year/{}/set/{}".format(season, page)
req = Request(url , headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
page_soup = soup(webpage, "html.parser")

# Scrape all of the data in the table
rows = page_soup.findAll('tr')[1:]
player_stats = [[td.getText() for td in rows[i].findAll('td')]
                for i in range(len(rows))]

# Get the column headers
headers = player_stats[0]

# Remove the first row
player_stats.pop(0)

# Convert to pandas dataframe
df = pd.DataFrame(player_stats, columns = headers)

# Remove all rows where Name = None
df = df[~df['NAME'].isnull()]

# Remove PLAYER column because it's empty
df = df.drop(columns='PLAYER')
df

任何建议都将不胜感激！我对使用BeautifulSoup有点陌生，所以如果代码不是特别好或者效率不高，我会提前道歉

更新：只有在Chrome中打开链接时，链接才起作用，这可能是导致问题的原因。有什么办法吗

Tags： the 数据 com http df 链接 stats page

0条回答

目前没有回答

BeautifulSoup在第一页之后不工作

相关问题更多 >

编程相关推荐

热门问题

热门文章

BeautifulSoup在第一页之后不工作

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >