合并多个数据帧,其中一些行不是sam

2024-06-25 23:19:13 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我有5个来自国际足联13-17的前80名球员的数据帧,每个数据帧包含球员的名字、级别和俱乐部。我的最终目标是将所有这些数据集合并在一起,这样我就可以对每个玩家每年进行一次评级,如果他们没有成功的话,则为空值。很明显,有些球员并不是每年都能进入前80名的。 下面是三个数据帧的片段。你知道吗

国际足联18

Name Overall Club 0 Cristiano Ronaldo 94 Real Madrid CF 1 L. Messi 93 FC Barcelona 2 Neymar 92 FC Barcelona 3 L. Suárez 92 FC Barcelona 4 M. Neuer 92 FC Bayern Munich 5 De Gea 90 Manchester United 6 R. Lewandowski 90 FC Bayern Munich 7 J. Boateng 90 FC Bayern Munich 8 G. Bale 90 Real Madrid CF 9 Z. Ibrahimović 90 Manchester United 10 T. Courtois 89 Chelsea

国际足联13

Name Overall Club 0 L. Messi 94 FC Barcelona 1 Cristiano Ronaldo 92 Real Madrid CF 2 F. Ribéry 90 FC Bayern Munich 3 Xavi 90 FC Barcelona 4 Iniesta 90 FC Barcelona 5 N. Vidić 89 Manchester United 6 W. Rooney 89 Manchester United 7 Casillas 89 Real Madrid CF 8 David Silva 88 Manchester City 9 Falcao 88 Atlético Madrid 10 Z. Ibrahimović 88 Paris Saint-Germain

出现这种情况的一个例子是N.Vidić,他已经退休。你知道吗

我的目标是

Name FIFA17 FIA13 Club 0 Cristiano Ronaldo 94 92 Real Madrid CF 1 L. Messi 93 94 FC Barcelona 2 Neymar 92 83 FC Barcelona 3 L. Suárez 92 86 FC Barcelona 4 M. Neuer 92 87 FC Bayern Munich 5 De Gea 90 82 Manchester United 6 R. Lewandowski 90 80 FC Bayern Munich 7 J. Boateng 90 84 FC Bayern Munich 8 G. Bale 90 86 Real Madrid CF 9 Z. Ibrahimović 90 88 Manchester United 10 T. Courtois 89 83 Chelsea 11 F. Ribéry 86 90 FC Bayern Munich 12 Xavi 0 90 FC Barcelona 13 Iniesta 88 90 FC Barcelona 14 N. Vidić 0 89 Manchester United 15 W. Rooney 0 89 Manchester United 16 Casillas 0 89 Real Madrid CF 17 David Silva 87 88 Manchester City 18 Falcao 0 88 Atlético Madrid

我不熟悉python和pandas,但我尝试过使用join和merge,但似乎总是使用每个表的索引,而不是唯一的名称。你知道吗

任何帮助都将不胜感激!你知道吗


Tags: 数据namerealcfunitedfc球员club
2条回答

这里有一种通过pd.concatpivot_table的方法。它假设您能够将数据帧放入字典中,字典的长度可以是任意的。你知道吗

该解决方案还处理多个俱乐部,只保留最新的俱乐部。你知道吗

dfs = {13: df13, 18: df18}

df = pd.concat([dfs[k].assign(Year=k) for k in dfs])

club_map = df.sort_values('Year', ascending=False)\
             .drop_duplicates('Name')\
             .set_index('Name')['Club']

df['Club'] = df['Name'].map(club_map)

res = df.pivot_table(index=['Name', 'Club'], columns='Year',
                     values='Overall', aggfunc=np.sum, fill_value=0)\
        .reset_index().rename_axis(None, axis='columns')

结果

                 Name               Club  13  18
0            Casillas     Real Madrid CF  89   0
1   Cristiano Ronaldo     Real Madrid CF  92  94
2         David Silva    Manchester City  88   0
3              De Gea  Manchester United   0  90
4           F. Ribéry   FC Bayern Munich  90   0
5              Falcao    Atlético Madrid  88   0
6             G. Bale     Real Madrid CF   0  90
7             Iniesta       FC Barcelona  90   0
8          J. Boateng   FC Bayern Munich   0  90
9            L. Messi       FC Barcelona  94  93
10          L. Suárez       FC Barcelona   0  92
11           M. Neuer   FC Bayern Munich   0  92
12           N. Vidić  Manchester United  89   0
13             Neymar       FC Barcelona   0  92
14     R. Lewandowski   FC Bayern Munich   0  90
15        T. Courtois            Chelsea   0  89
16          W. Rooney  Manchester United  89   0
17               Xavi       FC Barcelona  90   0
18     Z. Ibrahimović  Manchester United  88  90

在带有^{}的列中对MultiIndex使用^{},然后用^{}替换NaNs,强制转换为integers,最后将MultiIndex转换为^{}列:

s1 = df1.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
s2 = df2.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
df = pd.concat([s2, s1], axis=1, keys=('FIFA13','FIFA18')).fillna(0).astype(int).reset_index()
print (df)
                 Name                 Club  FIFA13  FIFA18
0            Casillas       Real Madrid CF      89       0
1   Cristiano Ronaldo       Real Madrid CF      92      94
2         David Silva      Manchester City      88       0
3              De Gea    Manchester United       0      90
4           F. Ribéry     FC Bayern Munich      90       0
5              Falcao      Atlético Madrid      88       0
6             G. Bale       Real Madrid CF       0      90
7             Iniesta         FC Barcelona      90       0
8          J. Boateng     FC Bayern Munich       0      90
9            L. Messi         FC Barcelona      94      93
10          L. Suárez         FC Barcelona       0      92
11           M. Neuer     FC Bayern Munich       0      92
12           N. Vidić    Manchester United      89       0
13             Neymar         FC Barcelona       0      92
14     R. Lewandowski     FC Bayern Munich       0      90
15        T. Courtois             Chelsean       0      89
16          W. Rooney    Manchester United      89       0
17               Xavi         FC Barcelona      90       0
18     Z. Ibrahimović    Manchester United       0      90
19     Z. Ibrahimović  Paris Saint-Germain      88       0

如果顺序是重要的,解决方案是相似的,只获得唯一对NamesClub,通过^{}^{}连接在一起并删除重复项:

s1 = df1.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
s2 = df2.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall']
df = pd.concat([s2, s1], axis=1, keys=('FIFA13','FIFA18')).fillna(0).astype(int)

idx = pd.concat([df1[['Name','Club']], df2[['Name','Club']]]).drop_duplicates()
df = df.reindex(idx).reset_index().drop_duplicates('Name', keep='last')
print (df)
                 Name               Club  FIFA13  FIFA18
0            L. Messi       FC Barcelona      94      93
1   Cristiano Ronaldo     Real Madrid CF      92      94
2           F. Ribéry   FC Bayern Munich      90       0
3                Xavi       FC Barcelona      90       0
4             Iniesta       FC Barcelona      90       0
5            N. Vidić  Manchester United      89       0
6           W. Rooney  Manchester United      89       0
7            Casillas     Real Madrid CF      89       0
8         David Silva    Manchester City      88       0
9              Falcao    Atlético Madrid      88       0
11             Neymar       FC Barcelona       0      92
12          L. Suárez       FC Barcelona       0      92
13           M. Neuer   FC Bayern Munich       0      92
14             De Gea  Manchester United       0      90
15     R. Lewandowski   FC Bayern Munich       0      90
16         J. Boateng   FC Bayern Munich       0      90
17            G. Bale     Real Madrid CF       0      90
18     Z. Ibrahimović  Manchester United       0      90
19        T. Courtois           Chelsean       0      89

对于一般解决方案,请使用list comprehension

dfs = [df2, df1]
names= ['FIFA13','FIFA18']
s = [x.drop_duplicates(['Name','Club']).set_index(['Name','Club'])['Overall'] for x in dfs]
df = pd.concat(s, axis=1, keys=(names)).fillna(0).astype(int)
s1 = [x[['Name','Club']] for x in dfs]
idx = pd.concat(s1).drop_duplicates()
df = df.reindex(idx).reset_index().drop_duplicates('Name', keep='last')

相关问题 更多 >