使用pandas根据特定顺序对重复列重新排序

2024-09-28 16:23:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个dataframe,它的列和行都有重复的索引名和有序的SUScript。我想对它们进行重新排序,以便它们按照我指定的特定顺序进行排序。我手动制作了一个简单的示例,如下所示。此代码将生成df,这是具有重复列和行的数据帧。我想使用列向量order,这样我的新数据帧df_new将具有列和行,这些列和行将遵循order指定的顺序(而suscript编号的顺序是递增的)

df = pd.DataFrame({0: [11, 3, 10, 9, 6, 9],
                   1: [12, 4, 11, 8, 5, 8],
                   2: [13, 5, 12, 7, 4, 7],
                   3: [14, 6, 13, 6, 3, 6],
                   4: [15, 7, 14, 5, 2, 5],
                   5: [16, 8, 15, 4, 1, 4]})
df = df.rename({0:"apple1", 1:"apple2",2:"banana1",3:"banana2",4:"pear1",5:"pear2"}, axis='columns')
df = df.rename({0:"apple1", 1:"apple2",2:"banana1",3:"banana2",4:"pear1",5:"pear2"}, axis='index')

order = ["banana", "pear", "apple" ]

df_new = pd.DataFrame({2: [12, 7, 4, 7, 13, 5],
                       3: [13, 6, 3, 6, 14, 6],
                       4: [14, 5, 2, 5, 15, 7],
                       5: [15, 4, 1, 4, 16, 8],
                       0: [10, 9, 6, 9, 11, 3],
                       1: [11, 8, 5, 8, 12, 4]})
df_new = df_new.rename({0:"apple1", 1:"apple2",2:"banana1",3:"banana2",4:"pear1",5:"pear2"}, axis='columns')
df_new = df_new.rename({0:"apple1", 1:"apple2",2:"banana1",3:"banana2",4:"pear1",5:"pear2"}, axis='index')

df
df_new

我发布了所需的结果图像(我不确定如何将上述代码的输出放在这个问题中),如下所示。谁能给我一个通用的方法来重新排序这些重复的列和行使用特定的顺序(以及相应的单元格?非常感谢

enter image description here


Tags: 数据代码dfnew排序顺序orderrename
1条回答
网友
1楼 · 发布于 2024-09-28 16:23:01

我的做法是:

使用^{}创建名称为的多索引:

# Extract the Numeric and Alphabetical Parts and convert to MultiIndex
df.columns = pd.MultiIndex.from_frame(
    df.columns.str.extract('([A-Za-z]+)(\d+)')
)
df.index = pd.MultiIndex.from_frame(
    df.index.str.extract('([A-Za-z]+)(\d+)')
)
0        apple     banana     pear    
1            1   2      1   2    1   2
0      1                              
apple  1    11  12     13  14   15  16
       2     3   4      5   6    7   8
banana 1    10  11     12  13   14  15
       2     9   8      7   6    5   4
pear   1     6   5      4   3    2   1
       2     9   8      7   6    5   4

然后在索引和带有order的列上^{}level=0:

order = ["banana", "pear", "apple"]
df = df.reindex(index=order, columns=order, level=0)
0        banana     pear     apple    
1             1   2    1   2     1   2
0      1                              
banana 1     12  13   14  15    10  11
       2      7   6    5   4     9   8
pear   1      4   3    2   1     6   5
       2      7   6    5   4     9   8
apple  1     13  14   15  16    11  12
       2      5   6    7   8     3   4

然后用^{}折叠多索引:

df.index = df.index.map(''.join)
df.columns = df.columns.map(''.join)
         banana1  banana2  pear1  pear2  apple1  apple2
banana1       12       13     14     15      10      11
banana2        7        6      5      4       9       8
pear1          4        3      2      1       6       5
pear2          7        6      5      4       9       8
apple1        13       14     15     16      11      12
apple2         5        6      7      8       3       4

完整代码:

import pandas as pd

idx = ['apple1', 'apple2', 'banana1', 'banana2', 'pear1', 'pear2']
df = pd.DataFrame(
    [[11, 3, 10, 9, 6, 9],
     [12, 4, 11, 8, 5, 8],
     [13, 5, 12, 7, 4, 7],
     [14, 6, 13, 6, 3, 6],
     [15, 7, 14, 5, 2, 5],
     [16, 8, 15, 4, 1, 4]],
    index=idx,
    columns=idx
)

df.columns = pd.MultiIndex.from_frame(
    df.columns.str.extract('([A-Za-z]+)(\d+)')
)
df.index = pd.MultiIndex.from_frame(
    df.index.str.extract('([A-Za-z]+)(\d+)')
)

order = ["banana", "pear", "apple"]
df = df.reindex(index=order, columns=order, level=0)

df.index = df.index.map(''.join)
df.columns = df.columns.map(''.join)
print(df)

(可选)将级别1转换为int并使用^{}进行数字排序:

df.columns = pd.MultiIndex.from_frame(
    df.columns.str.extract('([A-Za-z]+)(\d+)')
)
df.index = pd.MultiIndex.from_frame(
    df.index.str.extract('([A-Za-z]+)(\d+)')
)
# Convert level 1 to int on both index and columns 
df.index = [df.index.get_level_values(0),
            df.index.get_level_values(1).astype(int)]

df.columns = [df.columns.get_level_values(0),
              df.columns.get_level_values(1).astype(int)]

order = ["banana", "pear", "apple"]
df = (
    df
        .sort_index(level=1, axis=0)  # Sort level 1 on each axis
        .sort_index(level=1, axis=1)
        .reindex(index=order, columns=order, level=0)
)

# Fix map to handle non-numeric types
df.index = df.index.map(lambda s: ''.join(map(str, s)))
df.columns = df.columns.map(lambda s: ''.join(map(str, s)))

相关问题 更多 >