在索引中透视具有重复值的数据帧

2024-10-02 20:43:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个像这样的熊猫数据帧

    snapDate     instance   waitEvent                   AvgWaitInMs
0   2015-Jul-03  XX         gc cr block 3-way               1
1   2015-Jun-29  YY         gc current block 3-way          2
2   2015-Jul-03  YY         gc current block 3-way          1
3   2015-Jun-29  XX         gc current block 3-way          2
4   2015-Jul-01  XX         gc current block 3-way          2
5   2015-Jul-01  YY         gc current block 3-way          2
6   2015-Jul-03  XX         gc current block 3-way          2
7   2015-Jul-03  YY         log file sync                   9
8   2015-Jun-29  XX         log file sync                   8
9   2015-Jul-03  XX         log file sync                   8
10  2015-Jul-01  XX         log file sync                   8
11  2015-Jul-01  YY         log file sync                   9
12  2015-Jun-29  YY         log file sync                   8

我要把它变成

^{pr2}$

我试过pivot,但它返回一个错误 dfWaits.pivot(索引='snapDate',列='waitEvent',值='AvgWaitInMs') 索引包含重复项,无法重塑

结果应该是另一个数据帧


Tags: 数据logsynccurrentblockgcjunjul
2条回答

这里有一种方法可以将数据帧重塑为与您想要的类似的内容。如果您对生成的数据帧有任何额外的具体要求,请告诉我。在

import pandas as pd

# your data
# ====================================
print(df)

       snapDate instance               waitEvent  AvgWaitInMs
0                                                            
0   2015-Jul-03       XX       gc cr block 3-way            1
1   2015-Jun-29       YY  gc current block 3-way            2
2   2015-Jul-03       YY  gc current block 3-way            1
3   2015-Jun-29       XX  gc current block 3-way            2
4   2015-Jul-01       XX  gc current block 3-way            2
5   2015-Jul-01       YY  gc current block 3-way            2
6   2015-Jul-03       XX  gc current block 3-way            2
7   2015-Jul-03       YY           log file sync            9
8   2015-Jun-29       XX           log file sync            8
9   2015-Jul-03       XX           log file sync            8
10  2015-Jul-01       XX           log file sync            8
11  2015-Jul-01       YY           log file sync            9
12  2015-Jun-29       YY           log file sync            8

# processing
# ====================================
df_temp = df.set_index(['snapDate', 'instance', 'waitEvent']).unstack().fillna(0)

df_temp.columns = df_temp.columns.get_level_values(1).values

df_temp = df_temp.reset_index('instance')

print(df_temp)

            instance  gc cr block 3-way  gc current block 3-way  log file sync
snapDate                                                                      
2015-Jul-01       XX                  0                       2              8
2015-Jul-01       YY                  0                       2              9
2015-Jul-03       XX                  1                       2              8
2015-Jul-03       YY                  0                       1              9
2015-Jun-29       XX                  0                       2              8
2015-Jun-29       YY                  0                       2              8

您也可以使用pivot_table

df.pivot_table(index=['snapDate','instance'], columns='waitEvent', values='AvgWaitInMs')

Out[64]:
waitEvent             gc cr block 3-way  gc current block 3-way  log file sync
snapDate    instance
2015-Jul-01 XX                      NaN                       2              8
            YY                      NaN                       2              9
2015-Jul-03 XX                        1                       2              8
            YY                      NaN                       1              9
2015-Jun-29 XX                      NaN                       2              8
            YY                      NaN                       2              8

数据:

我使用下面的txt文件作为输入(使用read_csv来自pandas的数据帧)公司名称:

^{pr2}$

相关问题 更多 >