basketballreference中的Webscraping数据

1条回答

网友

1楼 · 发布于 2024-05-20 19:22:44

要获取advanced stats表，需要将其从html注释（它所在的位置）中提取出来。我不知道你想要所有"all advanced stats from the 2018-19 season."是什么意思

这里只有一个表包含id="all_advanced"和该季节的一行。如果你的意思是你想去那个链接，拉那个表，那是另一回事。但你不是很清楚

因此，这里要拉取该表，然后过滤该季节/行：

import requests
from bs4 import BeautifulSoup, Comment
import pandas as pd

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

page = "https://www.basketball-reference.com/players/c/curryst01.html"
pageTree = requests.get(page, headers=headers)
pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
comments = pageSoup.find_all(string=lambda text: isinstance(text, Comment))

tables = []
for each in comments:
    if 'table' in each:
        try:
            tables.append(pd.read_html(each, attrs = {'id': 'advanced'})[0])
        except:
            continue

df = tables[0]
df_filter = df[df['Season'] == '2018-19']

输出：

print (df.to_string())
     Season   Age   Tm   Lg  Pos    G     MP   PER    TS%   3PAr    FTr  ORB%  DRB%  TRB%  AST%  STL%  BLK%  TOV%  USG%  Unnamed: 19   OWS   DWS     WS  WS/48  Unnamed: 24  OBPM  DBPM   BPM  VORP
0   2009-10  21.0  GSW  NBA   PG   80   2896  16.3  0.568  0.332  0.175   1.8  12.0   6.8  24.6   2.5   0.5  16.5  21.8          NaN   3.0   1.6    4.7  0.077          NaN   1.1  -0.5   0.7   2.0
1   2010-11  22.0  GSW  NBA   PG   74   2489  19.4  0.595  0.325  0.216   2.3  10.9   6.5  28.1   2.2   0.6  16.4  24.4          NaN   5.4   1.3    6.6  0.128          NaN   3.0  -0.7   2.3   2.7
2   2011-12  23.0  GSW  NBA   PG   26    732  21.2  0.605  0.409  0.159   2.3  11.3   6.8  32.3   2.8   0.8  17.0  24.0          NaN   1.8   0.4    2.2  0.144          NaN   4.1   0.3   4.3   1.2
3   2012-13  24.0  GSW  NBA   PG   78   2983  21.3  0.589  0.432  0.210   2.3   9.1   5.8  31.1   2.1   0.3  13.7  26.4          NaN   8.4   2.8   11.2  0.180          NaN   5.3   0.1   5.4   5.6
4   2013-14  25.0  GSW  NBA   PG   78   2846  24.1  0.610  0.445  0.252   1.8  10.9   6.4  39.9   2.2   0.4  16.1  28.3          NaN   9.3   4.0   13.4  0.225          NaN   6.3   1.1   7.4   6.7
5   2014-15  26.0  GSW  NBA   PG   80   2613  28.0  0.638  0.482  0.251   2.4  11.4   7.0  38.6   3.0   0.5  14.3  28.9          NaN  11.5   4.1   15.7  0.288          NaN   8.2   1.7   9.9   7.9
6   2015-16  27.0  GSW  NBA   PG   79   2700  31.5  0.669  0.554  0.250   2.9  13.6   8.6  33.7   3.0   0.4  12.9  32.6          NaN  13.8   4.1   17.9  0.318          NaN  10.3   1.6  11.9   9.5
7   2016-17  28.0  GSW  NBA   PG   79   2638  24.6  0.624  0.547  0.251   2.7  11.4   7.3  31.2   2.6   0.5  13.0  30.1          NaN   8.7   3.9   12.6  0.229          NaN   6.7   0.3   6.9   5.9
8   2017-18  29.0  GSW  NBA   PG   51   1631  28.2  0.675  0.580  0.350   2.7  14.4   9.0  30.3   2.4   0.4  13.3  31.0          NaN   7.2   1.9    9.1  0.267          NaN   7.8   0.0   7.7   4.0
9   2018-19  30.0  GSW  NBA   PG   69   2331  24.4  0.641  0.604  0.214   2.2  14.2   8.4  24.2   1.9   0.9  11.6  30.4          NaN   7.2   2.5    9.7  0.199          NaN   7.1  -0.5   6.6   5.1
10  2019-20  31.0  GSW  NBA   PG    5    139  21.7  0.557  0.598  0.317   3.0  17.8  10.1  42.3   1.7   1.3  14.6  33.6          NaN   0.2   0.1    0.3  0.104          NaN   4.5  -0.6   3.9   0.2
11   Career   NaN  NaN  NBA  NaN  699  23998  23.8  0.623  0.481  0.237   2.3  11.8   7.2  31.5   2.5   0.5  14.2  27.9          NaN  76.5  26.7  103.2  0.207          NaN   6.0   0.4   6.4  50.7

和过滤器：

print (df_filter.to_string())
    Season   Age   Tm   Lg Pos   G    MP   PER    TS%   3PAr    FTr  ORB%  DRB%  TRB%  AST%  STL%  BLK%  TOV%  USG%  Unnamed: 19  OWS  DWS   WS  WS/48  Unnamed: 24  OBPM  DBPM  BPM  VORP
9  2018-19  30.0  GSW  NBA  PG  69  2331  24.4  0.641  0.604  0.214   2.2  14.2   8.4  24.2   1.9   0.9  11.6  30.4          NaN  7.2  2.5  9.7  0.199          NaN   7.1  -0.5  6.6   5.1

相关问题更多 >

编程相关推荐

热门问题

热门文章

basketballreference中的Webscraping数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >