Pandas:取消分组并融化缩进的记录

2024-09-30 00:26:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python的新手&;熊猫,你能告诉我是否有可能解组和解组这样的数据帧吗

源数据中的组位于由前缀空格标记的唯一列中,看起来像

import pandas as pd
import numpy
df = pd.DataFrame([
    ['Costs', numpy.nan, numpy.nan, numpy.nan],
    ['  Vehicles', numpy.nan, numpy.nan, numpy.nan],
    ['    Cars', numpy.nan, numpy.nan, numpy.nan],
    ['      BMW', 1000, 1100, 1010],
    ['      Toyota', 1203, 1302, 1322],
    ['    Cars - Total', 2203, 2402, 2332],
    ['    Trucks', numpy.nan, numpy.nan, numpy.nan],
    ['      Volvo', 5000, 5001, 5010],
    ['      MAN', 5500, 5055, 5066],
    ['    Trucks - Total', 10500, 10056, 10076],
    ['  Vehicles - Total', 12703, 12458, 12408],
    ['  Crew', numpy.nan, numpy.nan, numpy.nan],
    ['    Gomez Addams', 10000, 10000, 10000],
    ['    Morticia Addams', 10000, 10000, 10000],
    ['  Crew - Total', 20000, 20000, 20000],
    ['Costs - Total', 32703, 32458, 32408],
    ],    
    columns=['Level', 'Q1_2019', 'Q2_2019', 'Q3_2019'])

我需要把它转换成一张像

Level, Sublevel1, Sublevel2, Sublevel3, Sublevel4, Date, Value
"Costs", "Vehicles", "Cars", "BMW", "Q1_2019", 1000
"Costs", "Crew", "Gomez Addams", , "Q1_2019", 10000

现在我已经创建了额外的“子级别”列,用正则表达式回填它们,然后逐行填充子级别间隙,然后应用melt()。能做得更像Python吗


Tags: 数据importnumpynancarstotalpdq1
1条回答
网友
1楼 · 发布于 2024-09-30 00:26:00

这可能是一种更简洁的方法,但其思想是使用Total过滤掉组,然后使用向后和向前填充

然后,我们将出现次数少于1次的任何一组中的任何一种都丢弃,并按1-2级融化

df['sub_level'] = df['Level'].str.count('\s+')

df.loc[df["Level"].str.contains("Total"), "group"] = (
    df["Level"].str.strip().str.replace("- Total", "")
)

df['group'] = df['group'].bfill().ffill()

df = df[df.groupby('group')['group'].transform('count') > 1].dropna(how='any')

final_df = pd.melt(
    df.loc[df["sub_level"].isin([1, 2])].drop("sub_level", axis=1), id_vars=["Level",'group']
)

final_df.columns = ['Level','Type','Date','Value']

print(final_df)

                  Level     Type     Date    Value
0                   BMW    Cars   Q1_2019   1000.0
1                Toyota    Cars   Q1_2019   1203.0
2                 Volvo  Trucks   Q1_2019   5000.0
3                   MAN  Trucks   Q1_2019   5500.0
4          Gomez Addams    Crew   Q1_2019  10000.0
5       Morticia Addams    Crew   Q1_2019  10000.0
6                   BMW    Cars   Q2_2019   1100.0
7                Toyota    Cars   Q2_2019   1302.0
8                 Volvo  Trucks   Q2_2019   5001.0
9                   MAN  Trucks   Q2_2019   5055.0
10         Gomez Addams    Crew   Q2_2019  10000.0
11      Morticia Addams    Crew   Q2_2019  10000.0
12                  BMW    Cars   Q3_2019   1010.0
13               Toyota    Cars   Q3_2019   1322.0
14                Volvo  Trucks   Q3_2019   5010.0
15                  MAN  Trucks   Q3_2019   5066.0
16         Gomez Addams    Crew   Q3_2019  10000.0
17      Morticia Addams    Crew   Q3_2019  10000.0

相关问题 更多 >

    热门问题