堆栈数据帧

2024-06-28 11:21:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,我需要堆叠、融化或取消填充。对于每所学校,我需要为每个能力设置一个新行,并为级别设置一个新列。级别取决于容量是否为>;0城市栏也应包括在内:

data = pd.DataFrame({'school_name': {0: 'a', 1: 'b', 2: 'c'},
                     'primary': {0: 1, 1: 3, 2: 0},
                     'secondary': {0: 2, 1: 0, 2: 6},
                     'tertiary': {0:3, 1:6, 2:0},
                     'city': {0:'Bangkok', 1:'Frankfurt', 2:'Tel Aviv'}})
data

  school_name  primary  secondary  tertiary       city
0           a        1          2         3    Bangkok
1           b        3          0         6  Frankfurt
2           c        0          6         0   Tel Aviv

预期结果:

    school_name levels     capacity    city
0   a           primary     1          Bangkok
1   a           secondary   2          Bangkok
2   a           tertiary    3          Bangkok
3   b           primary     3          Frankfurt
4   b           tertiary    6          Frankfurt
5   c           secondary   6          Tel Aviv

Tags: 数据namecitydata能力级别学校secondary
3条回答

在筛选大于0的行之前,您可以使用pivot_longerfrom pyjanitor来重塑数据:

 import janitor

(df
 .pivot_longer(index=['school_name', 'city'], 
               names_to=("levels", ".value"), 
               names_sep="_")
 .query("capacity > 0")
 )


  school_name       city     levels  capacity
0           a    Bangkok    primary         1
1           b  Frankfurt    primary         3
3           a    Bangkok  secondary         2
5           c   Tel Aviv  secondary         6
6           a    Bangkok   tertiary         3
7           b  Frankfurt   tertiary         6

您还可以使用pandas wide_to_long

temp = df.rename(columns= lambda x: "_".join(x.split("_")[::-1])
                          if 'capacity' in x 
                          else x)

(pd.wide_to_long(temp, 
                 'capacity', 
                 ['school_name', 'city'], 
                 "levels", 
                  "_",
                  ".+")
   .query('capacity > 0').reset_index()
  )

  school_name       city     levels  capacity
0           a    Bangkok    primary         1
1           a    Bangkok  secondary         2
2           a    Bangkok   tertiary         3
3           b  Frankfurt    primary         3
4           b  Frankfurt   tertiary         6
5           c   Tel Aviv  secondary         6

您的问题已被编辑,因此此答案应足够:

(data.melt(['school_name', 'city'], 
           var_name='levels', 
           value_name='capacity')
    .query('capacity > 0'))

  school_name       city     levels  capacity
0           a    Bangkok    primary         1
1           b  Frankfurt    primary         3
3           a    Bangkok  secondary         2
5           c   Tel Aviv  secondary         6
6           a    Bangkok   tertiary         3
7           b  Frankfurt   tertiary         6

我将用nan替换0值,因为我知道这意味着该功能不存在

data2 = data.replace(0, np.nan)

我想你需要的是融化。(并删除nan值)

data2.melt(id_vars= ['school_name', 'city'], value_vars=['primary', 'secondary', 'tertiary']).dropna()

   school_name  city        variable    value
0   a           Bangkok     primary     1.0
1   b           Frankfurt   primary     3.0
3   a           Bangkok     secondary   2.0
5   c           Tel Aviv    secondary   6.0
6   a           Bangkok     tertiary    3.0
7   b           Frankfurt   tertiary    6.0

如果您不喜欢索引,请重置它们

data2.melt(id_vars= ['school_name', 'city'], value_vars=['primary', 'secondary', 'tertiary']).dropna().reset_index()

让我们^{}在屏蔽了primarysecondarytertiary列中的0值之后,重新塑造数据帧:

df = data.set_index(['school_name', 'city'])
df = df[df.ne(0)].stack().reset_index(name='capacity')\
                 .rename(columns={'level_2': 'levels'})

>>> df

  school_name       city     levels   capacity
0           a    Bangkok    primary       1.0
1           a    Bangkok  secondary       2.0
2           a    Bangkok   tertiary       3.0
3           b  Frankfurt    primary       3.0
4           b  Frankfurt   tertiary       6.0
5           c   Tel Aviv  secondary       6.0

相关问题 更多 >