如何在拆分中添加条件在每行上应用合并并重复解决方案？

cluster tag amount name 1 0 200 Michael 2 1 1200 John 2 1 900 Daniel 2 0 3000 David 2 0 600 Jonny 3 0 900 Denisse 3 1 900 Mike 3 1 3000 Kely 3 0 2000 Devon

cluster tag amount name highest_amount 1 0 200 Michael NaN 2 1 1200 John John 2 1 900 Daniel John 2 0 3000 David John 2 0 600 Jonny John 3 0 900 Denisse Kely 3 1 900 Mike Kely 3 1 3000 Kely Kely 3 0 2000 Devon Kely

cluster tag amount name highest_amount 1 0 200 Michael NaN 2 1 1200 John John 2 1 900 Daniel John 2 0 3000 David NaN 2 0 600 Jonny NaN 3 0 900 Denisse NaN 3 1 900 Mike Kely 3 1 3000 Kely Kely 3 0 2000 Devon NaN

1条回答

网友

1楼 · 发布于 2024-10-03 06:28:05

你可以分两个阶段来做。首先计算映射序列，然后按簇映射：

s = df.query('tag == 1')\
      .sort_values('amount', ascending=False)\
      .drop_duplicates('cluster')\
      .set_index('cluster')['name']

df['highest_name'] = df['cluster'].map(s)

print(df)

   cluster  tag  amount     name highest_name
0        1    0     200  Michael          NaN
1        2    1    1200     John         John
2        2    1     900   Daniel         John
3        2    0    3000    David         John
4        2    0     600    Jonny         John
5        3    0     900  Denisse         Kely
6        3    1     900     Mike         Kely
7        3    1    3000     Kely         Kely
8        3    0    2000    Devon         Kely

如果要使用groupby，有一种方法：

def func(x):
    names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
    return names.iloc[0] if not names.empty else np.nan

df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))

相关问题更多 >

编程相关推荐

热门问题

热门文章