Pandas将一个组的值投射到该组的每个成员

2024-05-20 21:38:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下几点df


In [83]: df = pd.DataFrame({
    ...:     'x': [1,1,1,2,2,2],
    ...:     'y': ['event', 'checkpoint', 'name'] * 2,
    ...:     'z': ['half_marathon', 21, 'healthy_run', 'full_marathon', 42, 'worthy_run']
    ...:     })

In [84]: df
Out[84]: 
   x           y              z
0  1       event  half_marathon
1  1  checkpoint             21
2  1        name    healthy_run
3  2       event  full_marathon
4  2  checkpoint             42
5  2        name     worthy_run

我想让每个y变量成为另一列,z中的值投影到每个组中。所需的df如下所示:

In [85]: desired_df
Out[85]: 
   x           y              z          event  checkpoint         name
0  1       event  half_marathon  half_marathon          21  healthy_run
1  1  checkpoint             21  half_marathon          21  healthy_run
2  1        name    healthy_run  half_marathon          21  healthy_run
3  2       event  full_marathon  full_marathon          42   worthy_run
4  2  checkpoint             42  full_marathon          42   worthy_run
5  2        name     worthy_run  full_marathon          42   worthy_run

我尝试使用GroupBy,但我不确定应该如何投影以下示例的checkpoint来填充其组的每个属性。将以下checkpoint值投影到组中每个成员的最佳方式是什么?谢谢

In [87]: _df = df.groupby(['y'])['z'].get_group('checkpoint')

In [88]: df.join(_df, rsuffix='joined_z')
Out[88]: 
   x           y              z zjoined_z
0  1       event  half_marathon       NaN
1  1  checkpoint             21        21
2  1        name    healthy_run       NaN
3  2       event  full_marathon       NaN
4  2  checkpoint             42        42
5  2        name     worthy_run       NaN

Tags: runnameineventdfnanoutfull
1条回答
网友
1楼 · 发布于 2024-05-20 21:38:12

^{}^{}一起使用:

df = df.join(df.pivot('x','y','z'), on='x')
print (df)
   x           y              z checkpoint          event         name
0  1       event  half_marathon         21  half_marathon  healthy_run
1  1  checkpoint             21         21  half_marathon  healthy_run
2  1        name    healthy_run         21  half_marathon  healthy_run
3  2       event  full_marathon         42  full_marathon   worthy_run
4  2  checkpoint             42         42  full_marathon   worthy_run
5  2        name     worthy_run         42  full_marathon   worthy_run

如果需要对原始列y中的新列进行排序,请使用^{}添加^{}

df = df.join(df.pivot('x','y','z').reindex(df['y'].unique(), axis=1), on='x')
print (df)
   x           y              z          event checkpoint         name
0  1       event  half_marathon  half_marathon         21  healthy_run
1  1  checkpoint             21  half_marathon         21  healthy_run
2  1        name    healthy_run  half_marathon         21  healthy_run
3  2       event  full_marathon  full_marathon         42   worthy_run
4  2  checkpoint             42  full_marathon         42   worthy_run
5  2        name     worthy_run  full_marathon         42   worthy_run

相关问题 更多 >