如何根据列值合并数据帧中的行？

gameID Won/Lost Home Away metric2 metric3 metric4 team1 team2 team3 team4 2017020001 1 1 0 10 10 10 1 0 0 0 2017020001 0 0 1 10 10 10 0 1 0 0

import numpy as np import pandas as pd import requests import json from sklearn import preprocessing from sklearn.preprocessing import OneHotEncoder results = [] for game_id in range(2017020001, 2017020010, 1): url = 'https://statsapi.web.nhl.com/api/v1/game/{}/boxscore'.format(game_id) r = requests.get(url) game_data = r.json() for homeaway in ['home','away']: game_dict = game_data.get('teams').get(homeaway).get('teamStats').get('teamSkaterStats') game_dict['team'] = game_data.get('teams').get(homeaway).get('team').get('name') game_dict['homeaway'] = homeaway game_dict['game_id'] = game_id results.append(game_dict) df = pd.DataFrame(results) df['Won/Lost'] = df.groupby('game_id')['goals'].apply(lambda g: (g == g.max()).map({True: 1, False: 0})) df["faceOffWinPercentage"] = df["faceOffWinPercentage"].astype('float') df["powerPlayPercentage"] = df["powerPlayPercentage"].astype('float') df["team"] = df["team"].astype('category') df = pd.get_dummies(df, columns=['homeaway']) df = pd.get_dummies(df, columns=['team'])

2条回答

网友

1楼 · 编辑于 2024-09-25 00:24:45

我想，你是在做面包和黄油：纽比，熊猫公司？在

如果是这样，我进一步假设，您的表当前存储在熊猫.DataFrame-名为“df”的实例：

将df分成两个df，然后将它们连接起来：

df_team1 = df[df['Won/Lost']==1]
df_team2 = df[df['Won/Lost']==0]
final_df = df_team1.join(df_team2, lsuffix='_team1', rsuffix='_team2', on='gameID')

当然，您可以对其进行编辑以更好地符合您的目的。例如，基于主/客场列创建df，等等

BR公司本：]

网友

2楼 · 编辑于 2024-09-25 00:24:45

这是基于这样一个假设：每个gameID正好有两行，并且希望按该ID分组（它还假设我理解这个问题）

改进的解决方案

给定一个数据帧df，例如

       gameID  Won/Lost  Home  Away  metric2  metric3  metric4  team1  team2  team3  team4
0  2017020001         1     1     0       10       10       10      1      0      0      0
1  2017020001         0     0     1       10       10       10      0      1      0      0
2  2017020002         1     1     0       10       10       10      1      0      0      0
3  2017020002         0     0     1       10       10       10      0      1      0      0

您可以使用pd.merge（和一些数据咀嚼）如下：

^{pr2}$

（我保留了Won/Lost的前缀，因为它表示这是主队的统计数据。另外，如果有人知道如何更优雅地添加前缀而不必重新命名gameID，请留言。）

原始尝试

分组后可以应用以下函数

def munge(group): 
     is_home = group.Home == 1 
     wonlost = group.loc[is_home, 'Won/Lost'].reset_index(drop=True) 
     group = group.loc[:, 'metric2':] 
     home = group[is_home].add_prefix('h_').reset_index(drop=True) 
     away = group[~is_home].add_prefix('a_').reset_index(drop=True) 
     return pd.concat([wonlost, home, away], axis=1)

。。。像这样：

>>> df.groupby('gameID').apply(munge).reset_index(level=1, drop=True)                                                                                                                                                                           
            Won/Lost  h_metric2  h_metric3  h_metric4  h_team1  h_team2  h_team3  h_team4  a_metric2  a_metric3  a_metric4  a_team1  a_team2  a_team3  a_team4
gameID                                                                                                                                                        
2017020001         1         10         10         10        1        0        0        0         10         10         10        0        1        0        0
2017020002         1         10         10         10        1        0        0        0         10         10         10        0        1        0        0

相关问题更多 >

编程相关推荐

热门问题

热门文章