Top 3 items to be shown in wide format in pandas datafram 在pandas数据框中展示的前3个宽格式条目

code attribute rank_count 394 Feminine 9 394 Fresh 9 394 Heavy 8 418 Soft 13 418 Fresh 12 418 Clean 11 539 Fresh 14 539 Soft 14 539 Feminine 11 555 Feminine 9 555 Heavy 8 555 Soft 7

2条回答

网友

1楼 · 编辑于 2024-09-27 09:31:49

您可以使用：

df = (df.sort_values(['code','rank_count'], ascending=(True, False))
       .assign(attribute=df['attribute'] + ' (' + df['rank_count'].astype(str) + ')', 
               g=df.groupby('code').cumcount() + 1)
       .query('g < 4')
       .set_index(['code','g'])['attribute']
       .unstack()
       .add_prefix('top')
       .rename_axis(None, axis=1)
       .reset_index())
print (df)
   code          top1        top2           top3
0   394  Feminine (9)   Fresh (9)      Heavy (8)
1   418     Soft (13)  Fresh (12)     Clean (11)
2   539    Fresh (14)   Soft (14)  Feminine (11)
3   555  Feminine (9)   Heavy (8)       Soft (7)

解释：

每2列的第一个^{}
将列attribute与rank_count连接，通过^{}与^{}添加新的计数列
如有必要，用^{}过滤顶部3
通过^{}和^{}重塑形状
^{}、^{}和^{}用于更清洁的最终DataFrame

编辑：

不含assign的溶液：

df = df.sort_values(['code','rank_count'], ascending=(True, False))
df['attribute']=df['attribute'] + ' (' + df['rank_count'].astype(str) + ')'
df['g'] = df.groupby('code').cumcount() + 1

df = (df.query('g < 4')
       .set_index(['code','g'])['attribute']
       .unstack()
       .add_prefix('top')
       .rename_axis(None, axis=1)
       .reset_index())
print (df)
   code          top1        top2           top3
0   394  Feminine (9)   Fresh (9)      Heavy (8)
1   418     Soft (13)  Fresh (12)     Clean (11)
2   539    Fresh (14)   Soft (14)  Feminine (11)
3   555  Feminine (9)   Heavy (8)       Soft (7)

网友

2楼 · 编辑于 2024-09-27 09:31:49

这是使用collections.defaultdict的一种方法。你知道吗

from collections import defaultdict
from operator import itemgetter

d = defaultdict(list)

for code, attr, rank in df.itertuples(index=False):
    d[code].append((attr, rank))

d = {k: sorted(v, key=itemgetter(1), reverse=True)[:3] for k, v in d.items()}

res = pd.DataFrame(d).T.reset_index()

print(res)

   index              0            1               2
0    394  (Feminine, 9)   (Fresh, 9)      (Heavy, 8)
1    418     (Soft, 13)  (Fresh, 12)     (Clean, 11)
2    539    (Fresh, 14)   (Soft, 14)  (Feminine, 11)
3    555  (Feminine, 9)   (Heavy, 8)       (Soft, 7)

您可以根据需要更改列名并提供其他格式。在我看来，存储元组比将数字数据转换成字符串更好。你知道吗

如果您真的需要字符串表示…

您可以使用pd.Series.apply：

for col in [0, 1, 2]:
    res[col] = res[col].apply(lambda x: '{0} ({1})'.format(x[0], x[1]))

相关问题更多 >

编程相关推荐

热门问题

热门文章

Top 3 items to be shown in wide format in pandas datafram 在pandas数据框中展示的前3个宽格式条目

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >