在数据框架中查找团队内的相关分数

2024-06-25 05:56:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一份NBA球员得分的列表,它跨越了几天。我的目标是确定哪些球员在同一天一起得分很好

我的数据集包含日期、球员姓名、球队和得分列:

Date    Team    Name    Points
2020-12-22  LAL Dennis Schroder 43
2020-12-22  LAL LeBron James    35
2020-12-22  LAL Kyle Kuzma  15.75
2020-12-23  LAL Dennis Schroder 22
2020-12-23  LAL LeBron James    23.25
2020-12-23  LAL Kyle Kuzma  39.75
2020-12-24  LAL Dennis Schroder 40
2020-12-24  LAL LeBron James    55.25
2020-12-24  LAL Kyle Kuzma  7

链接:https://docs.google.com/spreadsheets/d/e/2PACX-1vSqawsLtGqzIoptqIXY8MLF0TlLtMSoiXuE2EM3HgiAXrbXCnYTSSfI5pF0KYuzH_lYKU00dU6ED_76/pub?gid=0&single=true&output=csv

理想情况下,我将能够筛选到一个团队,并运行类似于df.T.corr()的操作,将球员姓名的汇总列表放入一个矩阵中,与同一团队中的其他球员进行比较

import pandas as pd
df = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSqawsLtGqzIoptqIXY8MLF0TlLtMSoiXuE2EM3HgiAXrbXCnYTSSfI5pF0KYuzH_lYKU00dU6ED_76/pub?gid=0&single=true&output=csv")
playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']
playerdf.corr()     #only correlates the columns to each other 
playerdf.T.corr()   #returns an empty dataframe
在我的例子中,似乎一个相关矩阵将显示勒布朗和丹尼斯之间的正相关,并且与两个球员的KYL呈负相关。p>

Tags: csvdf列表team姓名球员kylejames
1条回答
网友
1楼 · 发布于 2024-06-25 05:56:55

相关性只适用于数值变量。当你观察相关性时,你本质上是在问,“当x增加/减少时,y增加/减少了吗?”

你的问题是正确的,“随着勒布朗·詹姆斯的得分增加/下降,球员B的分数增加/减少。”但是你的数据没有被设置成这样。

playerdf.T
Out[66]: 
                    2             4    ...              409         423
Name    Dennis Schroder  LeBron James  ...  Markieff Morris  Marc Gasol
Date         2020-12-22    2020-12-22  ...       2020-12-25  2020-12-25
Points               43         35.25  ...            24.25       12.75
Team                LAL           LAL  ...              LAL         LAL

[4 rows x 26 columns]

我很好奇他们是怎么得分的

我们需要旋转,以便每个实例/行是日期/游戏,列是球员姓名,值是得分。一旦您这样做了,您就可以将其放入.corr()方法中

因此,仅使用2个游戏/日期的数据,您不会看到太多内容:

import pandas as pd
file = '"https://docs.google.com/spreadsheets/d/e/2PACX-1vRlZiz12o4zOCRrjuTgBFlUwRjWKz2v2o4-B8dZ6C-kHwkmI5wRWMO4vS9u2bRVtCy9UJkwPXp-BKCw/pub?gid=0&single=true&output=csv"'

df = pd.read_csv(file)

playerdf = df[['Name', 'Date', 'Points', 'Team']]
playerdf = playerdf[playerdf['Team']=='LAL']

playerdf = playerdf.pivot(index='Date',
                              columns='Name',
                              values='Points').fillna(0)

corr = playerdf.corr()

输出:

print (corr.to_string())
Name                      Alex Caruso  Alfonzo McKinnie  Anthony Davis  Dennis Schroder  Jared Dudley  Kentavious Caldwell-Pope  Kyle Kuzma  LeBron James  Marc Gasol  Markieff Morris  Montrezl Harrell  Quinn Cook  Talen Horton-Tucker  Wes Matthews
Name                                                                                                                                                                                                                                                   
Alex Caruso                       1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Alfonzo McKinnie                  1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Anthony Davis                     1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Dennis Schroder                  -1.0              -1.0           -1.0              1.0          -1.0                       1.0        -1.0          -1.0        -1.0             -1.0               1.0         NaN                 -1.0          -1.0
Jared Dudley                      1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Kentavious Caldwell-Pope         -1.0              -1.0           -1.0              1.0          -1.0                       1.0        -1.0          -1.0        -1.0             -1.0               1.0         NaN                 -1.0          -1.0
Kyle Kuzma                        1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
LeBron James                      1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Marc Gasol                        1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Markieff Morris                   1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Montrezl Harrell                 -1.0              -1.0           -1.0              1.0          -1.0                       1.0        -1.0          -1.0        -1.0             -1.0               1.0         NaN                 -1.0          -1.0
Quinn Cook                        NaN               NaN            NaN              NaN           NaN                       NaN         NaN           NaN         NaN              NaN               NaN         NaN                  NaN           NaN
Talen Horton-Tucker               1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0
Wes Matthews                      1.0               1.0            1.0             -1.0           1.0                      -1.0         1.0           1.0         1.0              1.0              -1.0         NaN                  1.0           1.0

如果我回去得到一个完整的季节值:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.basketball-reference.com/teams/LAL/2019_games.html'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
links = table.find_all('a', href=True)

boxscore_links = []
for link in links:
    if 'boxscores' in link['href'] and '.html' in link['href']:
        boxscore_links.append('https://www.basketball-reference.com' + link['href'])
        
playerdf = pd.DataFrame()
for link in boxscore_links:
    print (link)
    temp_df = pd.read_html(link, header=1,attrs={'id':'box-LAL-game-basic'})[0]
    temp_df = temp_df[['Starters', 'PTS']]
    temp_df = temp_df[temp_df['Starters'] != 'Team Totals']
    temp_df = temp_df[temp_df['Starters'] != 'Reserves']
    temp_df['PTS'] = temp_df['PTS'].replace('Did Not Play', 0)
    temp_df['PTS'] = temp_df['PTS'].replace('Did Not Dress', 0)
    temp_df['PTS'] = temp_df['PTS'].replace('Not With Team', 0)
    temp_df['PTS'] = temp_df['PTS'].astype(int)
    temp_df['Date'] = re.findall("\d+", link.split('/')[-1].split('.html')[0])[0]    
    temp_df = temp_df.rename(columns={'Starters':'Name', 'PTS':'Points'})   
    
    playerdf = playerdf.append(temp_df, sort=False).reset_index(drop=True)

playerdf = playerdf.pivot(index='Date',
                              columns='Name',
                              values='Points').fillna(0)

corr = playerdf.corr()

然后您可能会发现一些相关性:

输出:

print (corr.to_string())
Name                      Alex Caruso  Andre Ingram  Brandon Ingram  Isaac Bonga  Ivica Zubac  JaVale McGee  Jemerrio Jones  Johnathan Williams  Josh Hart  Kentavious Caldwell-Pope  Kyle Kuzma  Lance Stephenson  LeBron James  Lonzo Ball  Michael Beasley  Mike Muscala  Moritz Wagner  Rajon Rondo  Reggie Bullock  Scott Machado  Sviatoslav Mykhailiuk  Tyson Chandler
Name                                                                                                                                                                                                                                                                                                                                                                         
Alex Caruso                  1.000000           NaN       -0.502772     0.356931    -0.223081      0.360708        0.520267            0.635980  -0.377755                  0.331362   -0.427086         -0.279960     -0.258477   -0.395673        -0.190208      0.614652       0.462480     0.282011        0.295477       0.180002              -0.240216       -0.272816
Andre Ingram                      NaN           NaN             NaN          NaN          NaN           NaN             NaN                 NaN        NaN                       NaN         NaN               NaN           NaN         NaN              NaN           NaN            NaN          NaN             NaN            NaN                    NaN             NaN
Brandon Ingram              -0.502772           NaN        1.000000    -0.311075     0.280328     -0.212760       -0.252852           -0.502750   0.064457                 -0.330685    0.015547         -0.034681     -0.116722    0.068030         0.256519     -0.273952      -0.423331    -0.075037       -0.010224      -0.167714              -0.029635        0.142737
Isaac Bonga                  0.356931           NaN       -0.311075     1.000000    -0.014284      0.052887        0.212814            0.317496  -0.170178                  0.018247   -0.210940          0.033076     -0.215860   -0.107862        -0.046352      0.249809       0.506899     0.069940       -0.003765       0.237553               0.191829       -0.104224
Ivica Zubac                 -0.223081           NaN        0.280328    -0.014284     1.000000     -0.348919       -0.125094           -0.255467   0.097697                  0.003421    0.032512          0.154095     -0.462171    0.142622         0.449249     -0.204575      -0.046258    -0.060691       -0.268645      -0.082973               0.308421        0.115336
JaVale McGee                 0.360708           NaN       -0.212760     0.052887    -0.348919      1.000000        0.131512            0.203464  -0.195306                  0.088362   -0.161654          0.007220      0.071916   -0.250259        -0.189589      0.220799       0.025695     0.074450        0.051457       0.142273              -0.038746       -0.271256
Jemerrio Jones               0.520267           NaN       -0.252852     0.212814    -0.125094      0.131512        1.000000            0.544439  -0.246812                  0.401716   -0.362906         -0.201776     -0.287865   -0.191340        -0.111905      0.805160       0.250571     0.039685       -0.040080      -0.032381              -0.126897       -0.151910
Johnathan Williams           0.635980           NaN       -0.502750     0.317496    -0.255467      0.203464        0.544439            1.000000  -0.223735                  0.216588   -0.335991         -0.076575     -0.112725   -0.280153        -0.212707      0.530976       0.638914     0.057808        0.074619       0.179093              -0.220783       -0.310233
Josh Hart                   -0.377755           NaN        0.064457    -0.170178     0.097697     -0.195306       -0.246812           -0.223735   1.000000                 -0.202327    0.112090          0.106432      0.062429    0.359006         0.053293     -0.312218      -0.323296    -0.165224       -0.300856      -0.163708               0.190857        0.196536
Kentavious Caldwell-Pope     0.331362           NaN       -0.330685     0.018247     0.003421      0.088362        0.401716            0.216588  -0.202327                  1.000000   -0.254029         -0.053019     -0.329252   -0.151266        -0.087638      0.381221       0.187377     0.011464        0.038160       0.039444               0.037875        0.050367
Kyle Kuzma                  -0.427086           NaN        0.015547    -0.210940     0.032512     -0.161654       -0.362906           -0.335991   0.112090                 -0.254029    1.000000          0.039111      0.187677    0.355282         0.081492     -0.370250      -0.338748    -0.254589       -0.105824       0.049026               0.018252        0.141192
Lance Stephenson            -0.279960           NaN       -0.034681     0.033076     0.154095      0.007220       -0.201776           -0.076575   0.106432                 -0.053019    0.039111          1.000000     -0.048462    0.085465         0.009354     -0.265252      -0.066810    -0.071756       -0.357791       0.079382               0.264893        0.044603
LeBron James                -0.258477           NaN       -0.116722    -0.215860    -0.462171      0.071916       -0.287865           -0.112725   0.062429                 -0.329252    0.187677         -0.048462      1.000000   -0.021212        -0.417934     -0.336107      -0.227264     0.032238        0.098842      -0.119156              -0.177819       -0.099600
Lonzo Ball                  -0.395673           NaN        0.068030    -0.107862     0.142622     -0.250259       -0.191340           -0.280153   0.359006                 -0.151266    0.355282          0.085465     -0.021212    1.000000         0.078883     -0.312913      -0.298580    -0.442047       -0.410911      -0.126914               0.211892        0.520982
Michael Beasley             -0.190208           NaN        0.256519    -0.046352     0.449249     -0.189589       -0.111905           -0.212707   0.053293                 -0.087638    0.081492          0.009354     -0.417934    0.078883         1.000000     -0.183008       0.025792    -0.254584       -0.240322      -0.074226               0.167759        0.073540
Mike Muscala                 0.614652           NaN       -0.273952     0.249809    -0.204575      0.220799        0.805160            0.530976  -0.312218                  0.381221   -0.370250         -0.265252     -0.336107   -0.312913        -0.183008      1.000000       0.306389     0.203155        0.207427      -0.052954              -0.207525       -0.248431
Moritz Wagner                0.462480           NaN       -0.423331     0.506899    -0.046258      0.025695        0.250571            0.638914  -0.323296                  0.187377   -0.338748         -0.066810     -0.227264   -0.298580         0.025792      0.306389       1.000000     0.016732        0.147417       0.341310              -0.074224       -0.206353
Rajon Rondo                  0.282011           NaN       -0.075037     0.069940    -0.060691      0.074450        0.039685            0.057808  -0.165224                  0.011464   -0.254589         -0.071756      0.032238   -0.442047        -0.254584      0.203155       0.016732     1.000000        0.378034      -0.021978              -0.267364       -0.450237
Reggie Bullock               0.295477           NaN       -0.010224    -0.003765    -0.268645      0.051457       -0.040080            0.074619  -0.300856                  0.038160   -0.105824         -0.357791      0.098842   -0.410911        -0.240322      0.207427       0.147417     0.378034        1.000000      -0.069539              -0.272518       -0.296419
Scott Machado                0.180002           NaN       -0.167714     0.237553    -0.082973      0.142273       -0.032381            0.179093  -0.163708                  0.039444    0.049026          0.079382     -0.119156   -0.126914        -0.074226     -0.052954       0.341310    -0.021978       -0.069539       1.000000              -0.084170       -0.100761
Sviatoslav Mykhailiuk       -0.240216           NaN       -0.029635     0.191829     0.308421     -0.038746       -0.126897           -0.220783   0.190857                  0.037875    0.018252          0.264893     -0.177819    0.211892         0.167759     -0.207525      -0.074224    -0.267364       -0.272518      -0.084170               1.000000        0.255530
Tyson Chandler              -0.272816           NaN        0.142737    -0.104224     0.115336     -0.271256       -0.151910           -0.310233   0.196536                  0.050367    0.141192          0.044603     -0.099600    0.520982         0.073540     -0.248431      -0.206353    -0.450237       -0.296419      -0.100761               0.255530        1.000000

热图:

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
            square=True, ax=ax)

enter image description here

相关问题 更多 >