如何汇总数据框中的项?

2024-06-16 10:27:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧:

team1   team2   winner
KKR     RCB     KKR
CSK     KXIP    CSK
RR      DD      DD
MI      KKR     KKR
DC      KKR     KKR
KXIP    RR      RR
DC      DD      DD
MI      KKR     KKR.... 

现在我要检查的是一支球队在锦标赛中战胜另一支球队的次数。例如:小米赢了两次KKR。所以输出应该是MI vs KKR=MI:2 KKR:0

我可以手动完成,每次需要两个团队,但这需要更长的时间。有人能帮我吗


Tags: 数据rrdcddmi锦标赛球队winner
2条回答

如果整个数据集的团队顺序不一致,则需要定义match列:

df['match'] = df[['team1', 'team2']].apply(
    lambda row: tuple(sorted(row.values)), 
    axis=1
)

元组是分组所必需的,因为它是可散列的

不清楚您想要什么输出,但这会让您接近您的结果:

df.groupby('match')['winner'].value_counts()

输出:

match        winner
(CSK, KXIP)  CSK       1
(DC, DD)     DD        1
...

假设团队的顺序始终相同:

df.groupby(['team1','team2']).apply(lambda x: str(sum(x.winner == x.team1))+':'+str(sum(x.winner == x.team2)))

如果没有假设,这将是一个解决方案-使用df创建:

import pandas as pd

df = pd.DataFrame({'team1': ['KKR','CSK','RR','MI','DC','KXIP','DC','MI','KKR'],
                   'team2': ['RCB','KXIP','DD','KKR','KKR','RR','DD','KKR','MI'],
                   'winner': ['KKR','CSK','DD','KKR','KKR','RR','DD','KKR','MI']})


teamSort = [sorted(item) for item in df[['team1','team2']].as_matrix()]
df[['team1','team2']] = teamSort

df = df.groupby(['team1','team2']).apply(lambda x: str(sum(x.winner == x.team1))+':'+str(sum(x.winner == x.team2))).reset_index(name='score')

输出:

  team1 team2 score
0   CSK  KXIP   1:0
1    DC    DD   0:1
2    DC   KKR   0:1
3    DD    RR   1:0
4   KKR    MI   2:1
5   KKR   RCB   1:0
6  KXIP    RR   0:1

相关问题 更多 >