根据另一列的值创建索引数最大的新列

2024-09-28 05:40:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,有两列:“商品名称”和“总销售额”。我需要做另一列,其中将包含从1,2,3计算的最大销售额指数。。。其中1是最大数,2是第二大数,依此类推

希望你能帮助我

我的数据帧:

lst = [['Keyboard1', 1860], ['Keyboard2', 1650], ['Keyboard3', 900], ['Keyboard4', 1230], ['Keyboard5', 1150], ['Keyboard6', 1345],
                   ['Mouse1', 3100], ['Mouse2', 2900], ['Mouse3', 3050], ['Mouse4', 2750], ['Mouse5', 4100], ['Mouse6', 3910]]

df = pd.DataFrame(lst, columns = ['Goods', 'Sales'])

       Goods    Sales
0   Keyboard1   1860
1   Keyboard2   1650
2   Keyboard3   900
3   Keyboard4   1230
4   Keyboard5   1150
5   Keyboard6   1345
6   Mouse1  3100
7   Mouse2  2900
8   Mouse3  3050
9   Mouse4  2750
10  Mouse5  4100
11  Mouse6  3910

我正在尝试使用以下代码:

import pandas as pd
import numpy as np

df = df.sort_values('Sales', ascending = False)
df['Largest'] = np.arange(len(df))+1

但是我得到了所有商品的最大值索引,我需要分别得到每种商品的最大值索引。我的结果是:

        Goods  Sales  Largest
10     Mouse5    4100        1
11     Mouse6    3910        2
6      Mouse1    3100        3
8      Mouse3    3050        4
7      Mouse2    2900        5
9      Mouse4    2750        6
1   Keyboard2    1860        7
0   Keyboard1    1650        8
5   Keyboard6    1345        9
3   Keyboard4    1230       10
4   Keyboard5    1150       11
2   Keyboard3     900       12

以下是我需要的输出:

        Goods  Sales  Largest
10     Mouse5    4100        1
11     Mouse6    3910        2
6      Mouse1    3100        3
8      Mouse3    3050        4
7      Mouse2    2900        5
9      Mouse4    2750        6
1   Keyboard2    1860        1
0   Keyboard1    1650        2
5   Keyboard6    1345        3
3   Keyboard4    1230        4
4   Keyboard5    1150        5
2   Keyboard3     900        6

Tags: dfsalesgoodsmouse1mouse2mouse3keyboard6keyboard5
3条回答

只要做:

# remove any number of groups at the end
df['goods_group'] = df['Goods'].str.replace('\d+$', '')

# sort by the new column and sales
df = df.sort_values(['goods_group', 'Sales'], ascending=False)

# create largest column
df['largest'] = df.groupby('goods_group').cumcount() + 1

# drop the new column
res = df.drop('goods_group', 1)
print(res)

输出

        Goods  Sales  largest
10     Mouse5   4100        1
11     Mouse6   3910        2
6      Mouse1   3100        3
8      Mouse3   3050        4
7      Mouse2   2900        5
9      Mouse4   2750        6
0   Keyboard1   1860        1
1   Keyboard2   1650        2
5   Keyboard6   1345        3
3   Keyboard4   1230        4
4   Keyboard5   1150        5
2   Keyboard3    900        6

您可以groupbyGoods不带数字:

>>> df = df.sort_values('Sales', ascending=False)
>>> df
        Goods  Sales
10     Mouse5   4100
11     Mouse6   3910
6      Mouse1   3100
8      Mouse3   3050
7      Mouse2   2900
9      Mouse4   2750
0   Keyboard1   1860
1   Keyboard2   1650
5   Keyboard6   1345
3   Keyboard4   1230
4   Keyboard5   1150
2   Keyboard3    900
>>> df['Largest'] = df.groupby(df['Goods'].replace('\d+', '', regex=True)).cumcount() + 1
>>> df
        Goods  Sales  Largest
10     Mouse5   4100        1
11     Mouse6   3910        2
6      Mouse1   3100        3
8      Mouse3   3050        4
7      Mouse2   2900        5
9      Mouse4   2750        6
0   Keyboard1   1860        1
1   Keyboard2   1650        2
5   Keyboard6   1345        3
3   Keyboard4   1230        4
4   Keyboard5   1150        5
2   Keyboard3    900        6

尝试在代码末尾添加以下行:

df['new'] = df['Goods'].str[:-1]
df['Largest'] = df.groupby('new').cumcount() + 1
df = df.drop('new', axis=1)
print(df)

输出:

        Goods  Sales       new  Largest
10     Mouse5   4100     Mouse        1
11     Mouse6   3910     Mouse        2
6      Mouse1   3100     Mouse        3
8      Mouse3   3050     Mouse        4
7      Mouse2   2900     Mouse        5
9      Mouse4   2750     Mouse        6
0   Keyboard1   1860  Keyboard        1
1   Keyboard2   1650  Keyboard        2
5   Keyboard6   1345  Keyboard        3
3   Keyboard4   1230  Keyboard        4
4   Keyboard5   1150  Keyboard        5
2   Keyboard3    900  Keyboard        6

相关问题 更多 >

    热门问题