将索引值用作datafram中的类别值

2024-10-01 15:37:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧:

                  beat1   beat2   beat3   beat4   beat5   beat6   beat7  
filename                                                                  
M40_HC_503d.dat  0.7456  0.8574  0.7695  0.8698  0.8315  0.7908  0.8823   
M30_HC_461d.dat  0.7672  0.6682  0.7452  0.6853  0.7488  0.6782  0.6648   
M24_HC_459d.dat  0.6041  0.6439  0.5870  0.7452  0.6714  0.6684  0.6198   
M48_HC_543d.dat  0.8949  0.8570  0.9338  1.0545  1.0681  1.0775  0.8425   
M40_HC_506d.dat  0.7862  0.8917  0.9357  0.8250  0.8521  0.7146  0.7125

我想制作另一个dataframe,其中列名beat1beat7将是索引,它将有两列。在此数据帧的第一列中,值将是从beat1beat7的所有值,第二列将是值来自的filename。像这样:

    values   filename
ind   
0   0.7456  M40_HC_503d.dat
1   0.8574  M40_HC_503d.dat
2   0.7695  M40_HC_503d.dat
3   0.8698  M40_HC_503d.dat
4   0.8315  M40_HC_503d.dat
5   0.7908  M40_HC_503d.dat
6   0.8823  M40_HC_503d.dat
7   0.7672  M30_HC_461d.dat
8   0.6682  M30_HC_461d.dat
9   0.7452  M30_HC_461d.dat
10  0.6853  M30_HC_461d.dat
11  0.7488  M30_HC_461d.dat
12  0.6782  M30_HC_461d.dat
13  0.6648  M30_HC_461d.dat

我试过很多方法,包括转置等等,但都不管用。有什么想法吗?你知道吗


Tags: 数据hcfilenamedatm48m24m30beat1
2条回答
v = df.values
i = df.index.values

pd.DataFrame(
    np.hstack([v.reshape(-1, 1), i.repeat(v.shape[1])[:, None]]),
    columns=['values', 'filename']
)

   values         filename
0  0.7456  M40_HC_503d.dat
1  0.8574  M40_HC_503d.dat
2  0.7695  M40_HC_503d.dat
3  0.8698  M40_HC_503d.dat
4  0.8315  M40_HC_503d.dat
5  0.7908  M40_HC_503d.dat
6  0.8823  M40_HC_503d.dat
7  0.7672  M30_HC_461d.dat
8  0.6682  M30_HC_461d.dat
9  0.7452  M30_HC_461d.dat
...

我想你需要^{}

df = df.stack().reset_index(0, name='values')
print (df)
              filename  values
beat1  M40_HC_503d.dat  0.7456
beat2  M40_HC_503d.dat  0.8574
beat3  M40_HC_503d.dat  0.7695
beat4  M40_HC_503d.dat  0.8698
beat5  M40_HC_503d.dat  0.8315
beat6  M40_HC_503d.dat  0.7908
beat7  M40_HC_503d.dat  0.8823
beat1  M30_HC_461d.dat  0.7672
beat2  M30_HC_461d.dat  0.6682
beat3  M30_HC_461d.dat  0.7452
beat4  M30_HC_461d.dat  0.6853
beat5  M30_HC_461d.dat  0.7488
beat6  M30_HC_461d.dat  0.6782
...

或者:

df = df.stack().reset_index(0, name='values').reset_index(drop=True)
print (df)
           filename  values
0   M40_HC_503d.dat  0.7456
1   M40_HC_503d.dat  0.8574
2   M40_HC_503d.dat  0.7695
3   M40_HC_503d.dat  0.8698
4   M40_HC_503d.dat  0.8315
5   M40_HC_503d.dat  0.7908
6   M40_HC_503d.dat  0.8823
7   M30_HC_461d.dat  0.7672
8   M30_HC_461d.dat  0.6682
9   M30_HC_461d.dat  0.7452
10  M30_HC_461d.dat  0.6853
...
...

如果需要更改索引:

df = df.stack().reset_index(0, name='values')
df.index = df.index.str.extract('(\d+)', expand=False)
print (df)
          filename  values
1  M40_HC_503d.dat  0.7456
2  M40_HC_503d.dat  0.8574
3  M40_HC_503d.dat  0.7695
4  M40_HC_503d.dat  0.8698
5  M40_HC_503d.dat  0.8315
6  M40_HC_503d.dat  0.7908
7  M40_HC_503d.dat  0.8823
1  M30_HC_461d.dat  0.7672
2  M30_HC_461d.dat  0.6682
...
...

相关问题 更多 >

    热门问题