如何在pivot之后对python数据帧中除索引列以外的列进行排序

2024-10-01 15:38:18 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我有一个数据框

testdf = pd.DataFrame({"loc" : ["ab12","bc12","cd12","ab12","bc13","cd12"], "months" : 
         ["Jun21","Jun21","July21","July21","Aug21","Aug21"], "dept" : 
         ["dep1","dep2","dep3","dep2","dep1","dep3"], "count": [15, 16, 15, 92, 90, 2]})

看起来是这样的:

enter image description here

当我转动它时

df =  pd.pivot_table(testdf, values = ['count'], index = ['loc','dept'], columns = ['months'], aggfunc=np.sum).reset_index()
df.columns = df.columns.droplevel(0)
df

看起来是这样的:

enter image description here

我正在寻找一个排序功能,它将只排序月份列的顺序,而不是前2列,即loc&;系

当我尝试这个:

df.sort_values(by = ['Jun21'],ascending = False, inplace = True, axis = 1, ignore_index=True)[2:]

这给了我错误

我希望列的顺序是6月21日,7月21日,8月21日

我正在寻找的东西,这将使其动态,我不会需要手动改变序列时,每月的变化

任何暗示都将不胜感激


Tags: columnsdfindexlocpdmonthsdeptdep1
2条回答

如果使用groupby,这非常简单

df = testdf.groupby(['loc', 'dept', 'months']).sum().unstack(level=2)
df = df.reindex(['Jun21', 'July21', 'Aug21'], axis=1, level=1)

输出

          count             
months    Jun21 July21 Aug21
loc  dept                   
ab12 dep1  15.0    NaN   NaN
     dep2   NaN   92.0   NaN
bc12 dep2  16.0    NaN   NaN
bc13 dep1   NaN    NaN  90.0
cd12 dep3   NaN   15.0   2.0

我们可以从转换datetime中的months列开始,如下所示:

>>> testdf.months = (pd.to_datetime(testdf.months, format="%b%y", errors='coerce'))
>>> testdf
    loc     months      dept    count
0   ab12    2021-06-01  dep1    15
1   bc12    2021-06-01  dep2    16
2   cd12    2021-07-01  dep3    15
3   ab12    2021-07-01  dep2    92
4   bc13    2021-08-01  dep1    90
5   cd12    2021-08-01  dep3    2

然后,我们应用您的代码来获取pivot

>>> df =  pd.pivot_table(testdf, values = ['count'], index = ['loc','dept'], columns = ['months'], aggfunc=np.sum).reset_index()
>>> df.columns = df.columns.droplevel(0)
>>> df
months  NaT     NaT     2021-06-01  2021-07-01  2021-08-01
0       ab12    dep1    15.0        NaN         NaN
1       ab12    dep2    NaN         92.0        NaN
2       bc12    dep2    16.0        NaN         NaN
3       bc13    dep1    NaN         NaN         90.0
4       cd12    dep3    NaN         15.0        2.0

最后,我们可以使用strftime重新格式化列名,以获得预期结果:

>>> df.columns = df.columns.map(lambda t: t.strftime('%b%y') if pd.notnull(t) else '')
>>> df
months                  Jun21   Jul21   Aug21
0       ab12    dep1    15.0    NaN     NaN
1       ab12    dep2    NaN     92.0    NaN
2       bc12    dep2    16.0    NaN     NaN
3       bc13    dep1    NaN     NaN     90.0
4       cd12    dep3    NaN     15.0    2.0

相关问题 更多 >

    热门问题