淘汰主队

box_score_example_url='http://www.basketball-reference.com//boxscores/201910230POR.html' dfbox[] for eachBox in box_score_example_url: dfz = pd.read_html(eachBox) dfbox.append(dfz[0]) boxbox_awayteam = pd.concat(dfbox) boxbox_awayteam

1条回答

网友

1楼 · 发布于 2024-09-28 01:25:25

您可以使用BeautifulSoup和CSS选择器[id$="-game-basic"] table仅选择两个基本表，然后使用pd.read_html()加载这些表：

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = 'https://www.basketball-reference.com/boxscores/201910220TOR.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

my_tables = soup.select('[id$="-game-basic"] table')

df_1 = pd.read_html(str(my_tables[0]))[0].droplevel(0, axis=1)
df_2 = pd.read_html(str(my_tables[1]))[0].droplevel(0, axis=1)

print(df_1)
print(df_2)

印刷品：

                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]

编辑：要将此函数放入循环中，可以使用以下示例：

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/leagues/NBA_2020_games.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

def get_tables(url):
    soup = BeautifulSoup(requests.get(url).content, 'html.parser')

    my_tables = soup.select('[id$="-game-basic"] table')

    df_1 = pd.read_html(str(my_tables[0]))[0].droplevel(0, axis=1)
    df_2 = pd.read_html(str(my_tables[1]))[0].droplevel(0, axis=1)

    return df_1, df_2

for a in soup.select('.filter a'):
    u = 'https://www.basketball-reference.com' + a['href']
    print(u)
    soup2 = BeautifulSoup(requests.get(u).content, 'html.parser')
    for a2 in soup2.select('td a[href^="/boxscores/"]'):
        u2 = 'https://www.basketball-reference.com' + a2['href']
        t1, t2 = get_tables(u2)
        print(u2)
        print(t1)
        print(t2)
        print('-' * 80)

印刷品：

https://www.basketball-reference.com/leagues/NBA_2020_games-october.html
https://www.basketball-reference.com/boxscores/201910220TOR.html
                    Starters            MP  ...           PTS           +/-
0               Jrue Holiday         41:05  ...            13           -14
1             Brandon Ingram         35:06  ...            22           -19
2                J.J. Redick         27:03  ...            16           -14
3                 Lonzo Ball         24:50  ...             8            -7
4             Derrick Favors         20:46  ...             6           -12
5                   Reserves            MP  ...           PTS           +/-
6                  Josh Hart         28:10  ...            15            -1
7               Nicolò Melli         19:37  ...            14           +11
8           Kenrich Williams         18:02  ...             3           +11
9              Frank Jackson         13:51  ...             9            +7
10             Jahlil Okafor         12:29  ...             8            -7
11             E'Twaun Moore         12:06  ...             5            -1
12  Nickeil Alexander-Walker         11:55  ...             3            +6
13              Jaxson Hayes  Did Not Play  ...  Did Not Play  Did Not Play
14               Team Totals           265  ...           122           NaN

[15 rows x 21 columns]
           Starters            MP  ...           PTS           +/-
0        Kyle Lowry         44:59  ...            22            -1
1     Fred VanVleet         44:21  ...            34           +18
2     Pascal Siakam         38:09  ...            34            +5
3        OG Anunoby         35:48  ...            11           +12
4        Marc Gasol         31:55  ...             6            -2
5          Reserves            MP  ...           PTS           +/-
6     Norman Powell         28:38  ...             5            +2
7       Serge Ibaka         26:00  ...            13            +6
8     Terence Davis         15:10  ...             5             0
9       Matt Thomas  Did Not Play  ...  Did Not Play  Did Not Play
10    Chris Boucher  Did Not Play  ...  Did Not Play  Did Not Play
11  Stanley Johnson  Did Not Play  ...  Did Not Play  Did Not Play
12   Malcolm Miller  Did Not Play  ...  Did Not Play  Did Not Play
13  Dewan Hernandez  Did Not Play  ...  Did Not Play  Did Not Play
14      Team Totals           265  ...           130           NaN

[15 rows x 21 columns]
                                        
https://www.basketball-reference.com/boxscores/201910220LAC.html
                    Starters            MP  ...           PTS           +/-
0              Anthony Davis         37:22  ...            25            +3
1               LeBron James         36:00  ...            18            -8
2                Danny Green         32:20  ...            28            +7


...and so on.

相关问题更多 >

编程相关推荐

热门问题

热门文章

淘汰主队

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >