两列条件累计和

df = pd.DataFrame([ ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo","Gothenburg", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo" ,2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Gothenburg", "Malmo", 2018, 0, 3], ["Malmo", "Gothenburg", 2018, 0, 3], ["Gothenburg", "Malmo", 2018, 1, 1], ["Malmo", "Gothenburg", 2018, 0, 3], [ "Malmo","Gothenburg", 2018, 1, 1], [ "Malmo", "Gothenburg",2018, 0, 3], ]) df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points'] # Cumulaive sum for home/ away team with shift 1 row df["H_cumsum"] = df.groupby(['H_team', "Year"])['H_points'].transform( lambda x: x.cumsum().shift()) df["A_cumsum"] = df.groupby(['A_team', "Year"])['A_points'].transform( lambda x: x.cumsum().shift()) print(df) H_team A_team Year H_points A_points H_cumsum A_cumsum 0 Gothenburg Malmo 2018 1 1 NaN NaN 1 Malmo Gothenburg 2018 1 1 NaN NaN 2 Malmo Gothenburg 2018 0 3 1.0 1.0 3 Gothenburg Malmo 2018 1 1 1.0 1.0 4 Gothenburg Malmo 2018 0 3 2.0 2.0 5 Gothenburg Malmo 2018 1 1 2.0 5.0 6 Gothenburg Malmo 2018 0 3 3.0 6.0 7 Malmo Gothenburg 2018 0 3 1.0 4.0 8 Gothenburg Malmo 2018 1 1 3.0 9.0 9 Malmo Gothenburg 2018 0 3 1.0 7.0 10 Malmo Gothenburg 2018 1 1 1.0 10.0 11 Malmo Gothenburg 2018 0 3 2.0 11.0

row 0: Malmo = NaN, Gothenburg = NaN row 1: Gothenburg = 1, Malmo = 1 row 2: Malmo = 1 + 1 = 2, Gothenburg = 1 + 1 = 2 row 3: Gothenburg = 1 + 1 + 3 = 5, Malmo = 1 + 1 + 0 = 2 row 4: Gothenburg = 1 + 1 + 3 + 1 = 6, Malmo = 1 + 1 + 0 + 1 = 3 And so on...

2条回答

网友

1楼 · 编辑于 2024-06-25 05:58:39

我找到了一个解决方案，使用stack，但不是一个好的解决方案：

df  = pd.DataFrame([
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo","Gothenburg",  2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo" ,2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Gothenburg", "Malmo", 2018, 0, 3],
["Malmo", "Gothenburg", 2018, 0, 3],
["Gothenburg", "Malmo", 2018, 1, 1],
["Malmo", "Gothenburg", 2018, 0, 3],
[ "Malmo","Gothenburg", 2018, 1, 1],
[ "Malmo", "Gothenburg",2018, 0, 3],
])


df.columns = [['Team', 'Team', "Year", 'Points', 'Points'],
    ['Home', 'Away', 'Year', 'Home', 'Away']]

d1 = df.stack()
total = d1.groupby('Team').Points.apply(lambda x: x.shift().cumsum())
df = d1.assign(Total=total).unstack()

print(df)

   Points                  Team                  Year              Total           
     Away Home Year        Away        Home Year Away Home    Year  Away  Home Year
0     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   NaN   NaN  NaN
1     1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   1.0   1.0  NaN
2     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   2.0   2.0  NaN
3     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   2.0   5.0  NaN
4     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   3.0   6.0  NaN
5     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   6.0   6.0  NaN
6     3.0  0.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0   7.0   7.0  NaN
7     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0   7.0  10.0  NaN
8     1.0  1.0  NaN       Malmo  Gothenburg  NaN  NaN  NaN  2018.0  10.0  10.0  NaN
9     3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  11.0  11.0  NaN
10    1.0  1.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  14.0  11.0  NaN
11    3.0  0.0  NaN  Gothenburg       Malmo  NaN  NaN  NaN  2018.0  15.0  12.0  NaN

道达尔/客场和道达尔/主场下的分数是正确的。但是，使用所有额外的不必要列来概述表变得非常困难。（本例中没有显示的每一行有另外10列，因此非常混乱。）

所需输出为：

        H_team      A_team  Year  H_points  A_points  H_cumsum  A_cumsum
0   Gothenburg       Malmo  2018         1         1       NaN       NaN
1        Malmo  Gothenburg  2018         1         1       1.0       1.0
2        Malmo  Gothenburg  2018         0         3       2.0       2.0
3   Gothenburg       Malmo  2018         1         1       5.0       2.0
4   Gothenburg       Malmo  2018         0         3       6.0       3.0
5   Gothenburg       Malmo  2018         1         1       6.0       6.0
6   Gothenburg       Malmo  2018         0         3       7.0       7.0
7        Malmo  Gothenburg  2018         0         3       10.0      7.0
8   Gothenburg       Malmo  2018         1         1       10.0      10.0
9        Malmo  Gothenburg  2018         0         3       11.0      11.0
10       Malmo  Gothenburg  2018         1         1       11.0      14.0
11       Malmo  Gothenburg  2018         0         3       12.0      15.0

网友

2楼 · 编辑于 2024-06-25 05:58:39

在我这一方，这似乎算是不错的。这是一个有点长的手。你知道吗

df.columns = ['H_team', 'A_team', "Year", 'H_points', 'A_points']
# H_team cumsum() for science.

df['H_cumsum'] = df[['H_team', 'H_points']].groupby(['H_team']).cumsum()
# A_team cumsum() for more science.

df['A_cumsum'] = df[['A_team', 'A_points']].groupby(['A_team']).cumsum()
# Creating a column for the sum of the two, or total points scored by either side.

df['T_sum'] = df['H_points'] + df['A_points']

# Creating the cumsum() column for T_sum
df['T_cumsum'] = df['T_sum'].cumsum()

print(df)

相关问题更多 >

编程相关推荐

热门问题

热门文章