我有这种数据帧
H = Home win
D = Draw
A = Away win
Datetime HomeTeam AwayTeam HG AG FT
0 2021-02-17 22:00:00 Colo Colo U. De Concepcion 1 0 H
1 2021-02-15 14:30:00 Cobresal U. Espanola 4 1 H
2 2021-02-14 22:00:00 Deportes Iquique S. Wanderers 2 0 H
3 2021-02-14 22:00:00 La Serena A. Italiano 0 2 A
4 2021-02-14 22:00:00 O'Higgins Colo Colo 1 1 D
... ... ... ... ... ... ...
我想数一数前几排球队每场比赛的胜利数。我可以使用以下代码执行此操作:
_hometeam_count = df.groupby("HomeTeam").apply(lambda x: x.iloc[1:, :]["FT"].count())
_hometeam_sum = df[df['FT'] == 'H'].groupby("HomeTeam").apply(lambda x: x.iloc[1:, :]["FT"].count())
df1["WinsH/MPH"] = df["HomeTeam"].apply(lambda x: (_hometeam_sum.loc[x] if x in _hometeam_sum.index else 0) / (_hometeam_count.loc[x] if x in _hometeam_count.index else 0))
但是有一个问题,它总是从第一行开始计数,如果第19行中的匹配项应从该行开始计数,而不是从第1行开始计数。如何修复代码,对每行进行计算
Datetime HomeTeam AwayTeam HG AG FT HG_1ST AG_1ST FT_1ST HG_2ND AG_2ND FT_2ND 1 X 2 WinsH/MPH
0 2021-02-17 22:00:00 Colo Colo U. De Concepcion 1 0 H 1 0 H 0 0 D 2.53 3.01 2.80 0.352941
2 2021-02-14 22:00:00 Deportes Iquique S. Wanderers 2 0 H 0 0 D 2 0 H 3.13 3.55 2.08 0.312500
3 2021-02-14 22:00:00 La Serena A. Italiano 0 2 A 0 0 D 0 2 A 2.14 3.22 3.31 0.312500
4 2021-02-14 22:00:00 O'Higgins Colo Colo 1 1 D 0 0 D 1 1 D 2.27 3.14 3.10 0.187500
.. ... ... ... .. .. .. ... ... ... ... ... ... ... ... ... ...
302 2020-01-26 16:00:00 S. Wanderers U. Catolica 0 3 A 0 1 A 0 2 A 3.75 3.23 1.97 0.500000
303 2020-01-26 00:30:00 A. Italiano Cobresal 4 1 H 1 1 D 3 0 H 2.12 3.31 3.23 0.375000
304 2020-01-25 16:00:00 Antofagasta Coquimbo 2 1 H 2 0 H 0 1 A 2.19 3.20 3.16 0.437500
305 2020-01-25 01:00:00 O'Higgins Union La Calera 1 2 A 0 1 A 1 1 D 2.24 3.17 3.10 0.187500
306 2020-01-24 22:30:00 Everton U. De Concepcion 2 1 H 0 1 A 2 0 H 1.76 3.45 4.52 0.375000
在最后一行中,“WinsH/MPH”中的预期结果应为0,因为之前没有匹配项
可用于此类分析的是.cumsum()方法。如果您在一列中有浮点或整数,您可以轻松地将该点之前的赢数相加。确保顺序正确(做一些简单的测试)。例如:
等等
相关问题 更多 >
编程相关推荐