pandas:改变行数据的形状并将其分组到d列中

2024-04-26 05:21:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个以前是数据库格式的数据帧(不是我的选择),本例中的重点是行,而不是列。在

 df = pd.DataFrame([['John','Sept',1,'Dec',2],['Jane','Sept',1,'Dec',3],['James','Sept',2,'Dec',2]],columns=['Name','Test 1','Score 1','Test 2','Score 2'])

   Name Test 1  Score 1 Test 2  Score 2
0   John   Sept        1    Dec        2
1   Jane   Sept        1    Dec        3
2  James   Sept        2    Dec        2

我想把它转换成这种格式。在

^{pr2}$

所以基本上我想合并测试列,以便它们在Name列上分组。到目前为止,我查看了melt()和unstack(),这有点像我要找的:

melt = pd.melt(df,id_vars=['Name','1st Test'])

    Name Test 1 variable value
0   John   Sept  Score 1     1
1   Jane   Sept  Score 1     1
2  James   Sept  Score 1     2
3   John   Sept   Test 2   Dec
4   Jane   Sept   Test 2   Dec
5  James   Sept   Test 2   Dec
6   John   Sept  Score 2     2
7   Jane   Sept  Score 2     3
8  James   Sept  Score 2     2

我很确定groupby、melt或unstack都能让我达到目的,但我就是不能正确理解语法。如有建议,将不胜感激。在

背景:我想(我希望)这种新的格式能让我把分数和考试时间的变化用图表表示出来。在


Tags: 数据nametest数据库df格式johndec
2条回答

可能有一些方法可以使用这些函数,但是您可以不使用它们,而是将其拆分为两个数据帧,然后使用append()将它们堆叠起来。在

df = pd.DataFrame([['John','Sept',1,'Dec',2],['Jane','Sept',1,'Dec',3],['James','Sept',2,'Dec',2]],columns=['Name','Test 1','Score 1','Test 2','Score 2'])

# split off frame 1
df1 = df.loc[:,['Name','Test 1','Score 1']]
df1.columns = ['Name','Date','Score']
df1['Test'] = 1
df1
Out[4]:
Name    Date    Score   Test
John    Sept    1       1
Jane    Sept    1       1
James   Sept    2       1

# split off frame 2
df2 = df.loc[:,['Name','Test 2','Score 2
df2 = df.loc[:,['Name','Test 2','Score 2']]
df2.columns = ['Name','Date','Score']
df2['Test'] = 2
df2
Out[5]:
Name    Date    Score   Test
John    Dec     2       2
Jane    Dec     3       2 
James   Dec     2       2

# combine the two frames
df = df1.append(df2)
df.sort_values('N
df = df1.append(df2)
df.sort_values('Name')
Out[6]:
Name    Date    Score   Test
James   Sept    2       1
James   Dec     2       2
Jane    Sept    1       1
Jane    Dec     3       2
John    Sept    1       1
John    Dec     2       2

您可以将lreshape^{}一起使用:

df['T1'] = 1
df['T2'] = 2

df = (pd.lreshape(df, {'Test': ['T1', 'T2'],
                       'Date': ['Test 1', 'Test 2'], 
                       'Score': ['Score 1', 'Score 2']}))

#reorder columns, sort dataframe by Name
df = df[['Name','Test','Date','Score']].sort_values('Name', ascending=False)
print (df)

    Name  Test  Date  Score
0   John     1  Sept      1
3   John     2   Dec      2
1   Jane     1  Sept      1
4   Jane     2   Dec      3
2  James     1  Sept      2
5  James     2   Dec      2

pd.lreshape文档不是很好,但是您可以使用:

^{pr2}$

相关问题 更多 >