lambda函数同时返回df和series，为什么？

df = pd.DataFrame({'label' : ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c'], 't' : [1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, ], 'x' : [48, 6, 30, 30, 53, 48, 25, 51, 9, 55, 2]}) top3 = lambda x : x.groupby('t')['x'].idxmax().head(3)

print(df.groupby('label').apply(top3)) label t a 1 0 2 1 3 2 b 1 5 2 6 3 7 c 1 9 2 10 Name: x, dtype: int64 df2 = df[df.label=='a'] print(df2.groupby('label').apply(top3)) t 1 2 3 label a 0 1 2 df3 = df[df.label.isin(['a', 'b'])] print(df3.groupby('label').apply(top3)) t 1 2 3 label a 0 1 2 b 5 6 7

1条回答

网友

1楼 · 发布于 2024-09-27 09:30:00

.groupby.apply()背后有很多魔力，它试图将事物强制成它认为最好的形状。当从传递的数据帧中排除c时，它可以强制将内容转换为干净的矩形数据帧，但如果包含c，它将返回到多索引：

In [71]: df[df.label.isin(['a', 'c'])].groupby('label').apply(top3)
Out[71]:
label  t
a      1     0
       2     1
       3     2
c      1     9
       2    10
Name: x, dtype: int64

如果您想遵循pandas代码中的兔子洞，可以从这里开始：https://github.com/pandas-dev/pandas/blob/30362ed828bebdd58d4f1f74d70236d32547d52a/pandas/core/groupby/ops.py#L189

相关问题更多 >

编程相关推荐

热门问题

热门文章