如何基于pandas中的其他列求和一列的值?

2024-06-23 19:47:29 发布

您现在位置:Python中文网/ 问答频道 /正文

使用如下所示的数据帧(以下为文本版本): enter image description here

我应该计算一下自2010年以来哪个国家在锦标赛中进球最多。到目前为止,我已经成功地通过过滤掉如下友谊来操纵数据帧:

no_friendlies = df[df.tournament != "Friendly"]

然后我将date列设置为索引,以便筛选出2010年之前的所有匹配项:

no_friendlies_indexed = no_friendlies.set_index('date')
since_2010 = no_friendlies_indexed.loc['2010-01-01':]

从这一点开始我就迷路了,因为我不知道如何计算每个国家在主场和客场的进球数

感谢您的帮助/建议

编辑:

样本数据的文本版本:

date    home_team   away_team   home_score  away_score  tournament  city    country     neutral
0   1872-11-30  Scotland    England     0   0       Friendly    Glasgow     Scotland    False
1   1873-03-08  England     Scotland    4   2       Friendly    London  England     False
2   1874-03-07  Scotland    England     2   1       Friendly    Glasgow     Scotland    False
3   1875-03-06  England     Scotland    2   2       Friendly    London  England     False
4   1876-03-04  Scotland    England     3   0       Friendly    Glasgow     Scotland    False
5   1876-03-25  Scotland    Wales       4   0       Friendly    Glasgow     Scotland    False
6   1877-03-03  England     Scotland    1   3       Friendly    London  England     False
7   1877-03-05  Wales       Scotland    0   2       Friendly    Wrexham     Wales   False
8   1878-03-02  Scotland    England     7   2       Friendly    Glasgow     Scotland    False
9   1878-03-23  Scotland    Wales       9   0       Friendly    Glasgow     Scotland    False
10  1879-01-18  England     Wales       2   1       Friendly    London  England     False

编辑2:

我刚刚尝试过这样做:

since_2010.groupby(['home_team', 'home_score']).sum()

但它不会返回主队的主场进球总数(如果这样做有效的话,我会重复这样做,让客队得到总进球数)


Tags: 数据no文本falsehomedateteamscore
2条回答

主队.groupby.sum(),然后客队也这样做,并将两者相加:

df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()

输出:

England     12
Scotland    34
Wales        1

更详细的解释(根据评论):

  1. 您只需要.groupby一列home_team。在你的回答中,你是按['home_team', 'home_score']分组的。你的目标(不是双关语)是得到home_score.sum(),所以你应该而不是.groupby()它。如您所见['home_score']位于我使用.groupby的部分之后,因此我可以得到它的.sum()。这让你为主队做好准备
  2. 然后,对away_team执行相同的操作
  3. 在这一点上,python/pandas足够聪明,因为home_teamaway_team组的结果对于国家具有相同的值,您可以简单地将它们相加

使用^{}重塑形状。好处是它会自动创建一个'home_or_away'指示符,但我们将首先更改列,使它们成为“score\u home”(而不是“home\u score”)

# Swap column stubs around `'_'`
df.columns = ['_'.join(x[::-1]) for x in df.columns.str.split('_')]

# Your code to filter, would drop everything in your provided example
# df['date'] = pd.to_datetime(df['date'])
# df[df['date'].dt.year.gt(2010) & df['tournament'].ne('Friendly')]

df = pd.wide_to_long(df, i='date', j='home_or_away',
                     stubnames=['team', 'score'], sep='_', suffix='.*')

#                          country  neutral tournament     city      team  score
#date       home_or_away                                                        
#1872-11-30 home          Scotland    False   Friendly  Glasgow  Scotland      0
#1873-03-08 home           England    False   Friendly   London   England      4
#1874-03-07 home          Scotland    False   Friendly  Glasgow  Scotland      2
#...
#1878-03-02 away          Scotland    False   Friendly  Glasgow   England      2
#1878-03-23 away          Scotland    False   Friendly  Glasgow     Wales      0
#1879-01-18 away           England    False   Friendly   London     Wales      1

所以现在无论是主场还是客场,你都可以得到分数:

df.groupby('team')['score'].sum()
#team
#England     12
#Scotland    34
#Wales        1
#Name: score, dtype: int64

相关问题 更多 >

    热门问题