熊猫: 将groupby的结果分配给dataframe的新列

df = pd.DataFrame({'size': list('SSMMMLS'), 'weight': [8, 10, 11, 1, 20, 14, 12], 'adult' : [False] * 5 + [True] * 2}) adult size weight 0 False S 8 1 False S 10 2 False M 11 3 False M 1 4 False M 20 5 True L 14 6 True S 12

3条回答

网友

1楼 · 编辑于 2024-07-02 14:04:01

您可以将transform与loc和values一起使用：

>>> df["size2"] = df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
>>> df
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L

一步一步，我们首先找到合适的指标：

^{pr2}$

然后我们使用这些来索引size列中的loc：

>>> df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")]
4    M
4    M
4    M
4    M
4    M
5    L
5    L
Name: size, dtype: object

最后，我们使用.values，这样当我们试图赋值时，索引不会妨碍我们：

>>> df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
array(['M', 'M', 'M', 'M', 'M', 'L', 'L'], dtype=object)
>>> df["size2"] = df["size"].loc[df.groupby("adult")["weight"].transform("idxmax")].values
>>> df
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L

网友

2楼 · 编辑于 2024-07-02 14:04:01

只是对@jazrael答案的更详细的描述，以及您的数据帧：

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})
#    adult size  weight
# 0  False    S       8
# 1  False    S      10
# 2  False    M      11
# 3  False    M       1
# 4  False    M      20
# 5   True    L      14
# 6   True    S      12

要获取“最大权重”行的大小值，请执行以下操作：

^{pr2}$

“成人”上的groupby将生成一个索引值为False和True的序列：

>>> size2_col = df.groupby('adult').apply(size4max_weight)
>>> type(size2_col), size2_col.index
(pandas.core.series.Series, Index([False, True], dtype='object', name=u'adult'))

使用reset_index我们在DataFrame中转换serie:：

>>> size2_col = df.groupby('adult').apply(size4max_weight).reset_index(name='size2')
>>> size2_col
   adult size2
0  False     M
1   True     L
>>>

pd.merge在“成人”上：

>>> pd.merge(df, size2_col, on=['adult'])
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L

网友

3楼 · 编辑于 2024-07-02 14:04:01

你可以用^{}。我认为size2中的第一个值是M，因为maxweight是{}。在

df = pd.DataFrame({'size': list('SSMMMLS'),
                   'weight': [8, 10, 11, 1, 20, 14, 12],
                   'adult' : [False] * 5 + [True] * 2})

print df
   adult size  weight
0  False    S       8
1  False    S      10
2  False    M      11
3  False    M       1
4  False    M      20
5   True    L      14
6   True    S      12

print df.groupby('adult').apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2')                
   adult size2
0  False     M
1   True     L

print pd.merge(df, df.groupby('adult').apply(lambda subf: subf['size'][subf['weight'].idxmax()]).reset_index(name='size2'), on=['adult'])            
   adult size  weight size2
0  False    S       8     M
1  False    S      10     M
2  False    M      11     M
3  False    M       1     M
4  False    M      20     M
5   True    L      14     L
6   True    S      12     L

相关问题更多 >

编程相关推荐

热门问题

热门文章