在不丢失或弄乱数据的情况下更改数据帧中的行顺序

2条回答

网友

1楼 · 编辑于 2024-05-02 20:12:27

对于reindex，需要从sample列创建索引：

df=df.set_index(['sample']).reindex(["CO ref","CO tus","CO raai"]).reset_index()

或使用有序分类：

cats = ["CO ref","CO tus","CO raai"]
df['sample'] = pd.CategoricalIndex(df['sample'], ordered=True, categories=cats)
df = df.sort_values('sample')

网友

2楼 · 编辑于 2024-05-02 20:12:27

耶斯雷尔的解决方案当然是正确的，而且很可能是最快的。但由于这实际上只是一个重构数据帧的问题，我想向您展示如何轻松地做到这一点，同时让您的过程选择要使用的排序列的子集

以下非常简单的函数将允许您指定数据帧的子集和顺序：

# function to subset and order a pandas
# dataframe of a long format
def order_df(df_input, order_by, order):
    df_output=pd.DataFrame()
    for var in order:    
        df_append=df_input[df_input[order_by]==var].copy()
        df_output = pd.concat([df_output, df_append])
    return(df_output)

下面是一个使用plotly express中的iris数据集的示例df['species'].unique()将显示该列的顺序：

输出：

array(['setosa', 'versicolor', 'virginica'], dtype=object)

现在，使用上面的函数运行下面的完整代码段将为您提供一个新的指定顺序。不需要分类变量或篡改索引

使用数据示例完成代码：

# imports
import pandas as pd
import plotly.express as px

# data
df = px.data.iris()

# function to subset and order a pandas
# dataframe fo a long format
def order_df(df_input, order_by, order):
    df_output=pd.DataFrame()
    for var in order:    
        df_append=df_input[df_input[order_by]==var].copy()
        df_output = pd.concat([df_output, df_append])
    return(df_output)

# data subsets
df_new = order_df(df_input = df, order_by='species', order=['virginica', 'setosa', 'versicolor'])
df_new['species'].unique()

输出：

array(['virginica', 'setosa', 'versicolor'], dtype=object)

相关问题更多 >

编程相关推荐

热门问题

热门文章

在不丢失或弄乱数据的情况下更改数据帧中的行顺序

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >