如何将多索引数据透视表转换为包含部分索引的嵌套列表?

2024-06-25 13:26:21 发布

您现在位置:Python中文网/ 问答频道 /正文

更新:我真的试过让我的示例代表参加,结果却没有。我更新了这个问题,因为精神是一样的,只是有点复杂


我正在使用一个大熊猫数据集,我想从中提取数据进行绘图。这个小例子应该说明这一点:

import pandas as pd

df = pd.DataFrame({'Name': pd.Categorical(['Carl', 'Carl', 'Carl', 'Tina', 'Tina', 'Tina',
                                           'Carl', 'Carl', 'Tina', 'Tina', 'Carl', 'Carl'] * 2),
                   'DayOfYear': [51, 20, 20, 1, 70, 140, 77, 190, 210, 365, 260, 333] * 2,
                   'Type': pd.Categorical(['Weight'] * 12 + ['Height'] * 12),
                   'Number': [60.3, 61.0, 59.8, 77.1, 74.0, 73.4, 58.2, 60.6, 73.6, 75.0, 59.7, 60.5,
                              172.3, 172.3, 172.3, 165.9, 165.9, 165.9,
                              172.3, 172.3, 165.9, 165.9, 172.3, 172.3],
                  })

我从分组开始,但改为透视表,因为工作流似乎更简单,应该是一样的,对吗?我尝试了很多东西,到目前为止,这让我离目标最近:

p = pd.pivot_table(df, index=['Name', 'DayOfYear'], values='Number', columns='Type')

the outcome of pivoting

对于打印,其余部分将转换为:

what_bqplot_needs_x = [
        [20, 51, 77, 190, 260, 333],
        [1, 70, 140, 210, 365],
        [20, 51, 77, 190, 260, 333],
        [1, 70, 140, 210, 365],
]
what_bqplot_needs_y = [
        [60.4, 60.3, 58.2, 60.6, 59.7, 60.5],
        [77.1, 74.0, 73.4, 73.6, 75.0],
        [172.3] * 6,
        [165.9] * 5,
]

我似乎不太理解/与熊猫相处,我真的很想了解如何通过按摩来达到目的

它不必是列表的列表,ndarray数组也可以


我尝试调整jezrael’s answer,但第一个.reset_index(level=1)已经崩溃:

TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

Tags: 数据namenumberdf列表indextypewhat
2条回答

第一个if use^{}如果重复,Weight的值是聚合的(如前两行):

df = pd.pivot_table(df, index=['Name', 'DayOfYear'], values='Weight')

然后与^{}中的聚合列表一起使用^{}

df1 = df.reset_index(level=1).groupby(level=0).agg(list)
print (df1)
                        DayOfYear                                Weight
Name                                                                   
Carl  [20, 51, 77, 190, 260, 333]  [60.4, 60.3, 58.2, 60.6, 59.7, 60.5]
Tina       [1, 70, 140, 210, 365]        [77.1, 74.0, 73.4, 73.6, 75.0]

最后将输出转换为列表:

what_i_want_x = df1['DayOfYear'].tolist()
what_i_want_y = df1['Weight'].tolist()

print (what_i_want_x)
[[20, 51, 77, 190, 260, 333], [1, 70, 140, 210, 365]]

print (what_i_want_y)
[[60.4, 60.3, 58.2, 60.6, 59.7, 60.5], [77.1, 74.0, 73.4, 73.6, 75.0]]

编辑:

p = pd.pivot_table(df, index=['Name', 'DayOfYear'], values='Number', columns='Type')
print (p)
Type            Height  Weight
Name DayOfYear                
Carl 20          172.3    60.4
     51          172.3    60.3
     77          172.3    58.2
     190         172.3    60.6
     260         172.3    59.7
     333         172.3    60.5
Tina 1           165.9    77.1
     70          165.9    74.0
     140         165.9    73.4
     210         165.9    73.6
     365         165.9    75.0
     
df1 = p.rename(columns=str).reset_index(level=1).groupby(level=0).agg(list)
print (df1)
Type                    DayOfYear                                      Height  \
Name                                                                            
Carl  [20, 51, 77, 190, 260, 333]  [172.3, 172.3, 172.3, 172.3, 172.3, 172.3]   
Tina       [1, 70, 140, 210, 365]         [165.9, 165.9, 165.9, 165.9, 165.9]   

Type                                Weight  
Name                                        
Carl  [60.4, 60.3, 58.2, 60.6, 59.7, 60.5]  
Tina        [77.1, 74.0, 73.4, 73.6, 75.0]  

jezrael的速度更快,但在这里,您可以使用pandas .groupby进行另一种选择:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': pd.Categorical(['Carl', 'Carl', 'Carl', 'Tina', 'Tina', 'Tina',
                                           'Carl', 'Carl', 'Tina', 'Tina', 'Carl', 'Carl']),
                   'DayOfYear': [20, 20, 51, 1, 70, 140, 77, 190, 210, 365, 260, 333],
                   'Weight': [61.0, 59.8, 60.3, 77.1, 74.0, 73.4, 58.2, 60.6, 73.6, 75.0, 59.7, 60.5]
                  })

df2 = df.groupby(["Name", "DayOfYear"]).mean().dropna().reset_index()
what_i_want_x = [list(df2["DayOfYear"][df2["Name"] == name_selected]) for name_selected in np.unique(df2["Name"])]

print(what_i_want_x)
[[20, 51, 77, 190, 260, 333], [1, 70, 140, 210, 365]]

what_i_want_y = [list(df2["Weight"][df2["Name"] == name_selected]) for name_selected in np.unique(df2["Name"])]

print(what_i_want_y )
[[60.4, 60.3, 58.2, 60.6, 59.7, 60.5], [77.1, 74.0, 73.4, 73.6, 75.0]]

相关问题 更多 >