如何按特定列以特定方式对数据帧的值进行排序(使用lambda函数,如在std lib中排序)

2024-10-06 11:28:02 发布

您现在位置:Python中文网/ 问答频道 /正文

鉴于以下数据:

import pandas as pd
import io

df = pd.read_csv(
    io.StringIO(
        "bit,val\nbit_0,40.9\nbit_1,49.6\nbit_2,50.5\nbit_3,37.7\nbit_4,52.0\nbit_5,55.1\nbit_6,40.6\nbit_7,37.8\nbit_8,39.2\nbit_9,51.1\nbit_10,48.4\nbit_11,49.8\nbit_12,51.7\nbit_13,46.7\nbit_14,40.8\nbit_15,41.1\nbit_16,36.7\nbit_17,50.8\nbit_18,41.6\nbit_19,41.3\n"
    )
)

df = df.sample(len(df), random_state=1).reset_index(drop=True)

看起来是:

       bit   val
0    bit_3  37.7
1   bit_16  36.7
2    bit_6  40.6
3   bit_10  48.4
4    bit_2  50.5
5   bit_14  40.8
6    bit_4  52.0
7   bit_17  50.8
8    bit_7  37.8
9    bit_1  49.6
10  bit_13  46.7
11   bit_0  40.9
12  bit_19  41.3
13  bit_18  41.6
14   bit_9  51.1
15  bit_15  41.1
16   bit_8  39.2
17  bit_12  51.7
18  bit_11  49.8
19   bit_5  55.1

我想根据尾随数字按bit列对数据进行排序

如果这是一个标准的python列表,那么以下内容将起作用:

sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1]))

但我不确定如何将其应用于数据帧


Tags: csv数据sampleioimportpandasdfread
3条回答

使用df.sort_values.str.split("_",expand=True)并使用.astype(int)强制转换为int,如下所示:

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))

输出:

       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

如果需要重置索引,只需添加.reset_index(drop=True)

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)

输出:

       bit   val
0    bit_0  40.9
1    bit_1  49.6
2    bit_2  50.5
3    bit_3  37.7
4    bit_4  52.0
5    bit_5  55.1
6    bit_6  40.6
7    bit_7  37.8
8    bit_8  39.2
9    bit_9  51.1
10  bit_10  48.4
11  bit_11  49.8
12  bit_12  51.7
13  bit_13  46.7
14  bit_14  40.8
15  bit_15  41.1
16  bit_16  36.7
17  bit_17  50.8
18  bit_18  41.6
19  bit_19  41.3

熊猫>;=1.1.0,您可以像在sorted中一样使用key
在我的解决方案中,我按位列排序,但对于排序,我抛出了bit_

df.sort_values(
    by='bit', 
    key=lambda x: x.str.replace('bit_', '').astype(int),
)

    bit     val
11  bit_0   40.9
9   bit_1   49.6
4   bit_2   50.5
0   bit_3   37.7
6   bit_4   52.0

.sort_values()上的文档:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

尝试使用natsort

from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]: 
       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

相关问题 更多 >