值错误:无法从重复轴重新索引没有重复的轴值

2024-10-01 11:23:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我将一个数据帧按年份分组(这是列的多索引的一个级别),应用一个函数将df填充为11列(根据需要添加任意多个空列),然后返回填充的df。但这引起了一个错误。在

finalFormat = (penultimateFormatNot11Columns.groupby( level = 'Year', 
                                                      axis  = 1 )
                                            .apply( padDFToXColumns )
              )




raise ValueError("cannot reindex from a duplicate axis")

在应用的padding函数中,返回的paddedDF在两个轴上都没有任何重复的级别

^{pr2}$

你知道这个错误是从哪里来的吗?在

填充函数

def padDFToXColumns( df, TOT_COLUMNS = 11 ):
    """
    Pad out the number of columns in df to TOT_COLUMNS (add TOT_COLUMNS - len(df) empty columns)
    """

    numColsInDF = len(df.columns)
    if numColsInDF > TOT_COLUMNS:
        print("ERROR: Number Of Columns (%s) Exceeds Max Columns (%s)" % (numColsInDF, TOT_COLUMNS))
        return

    ### Add Empty Columns ###
    numColsToAdd = TOT_COLUMNS - numColsInDF
    columnsToAdd = [ 'EmptyColumn' + str(num) for num in range(numColsInDF + 1, TOT_COLUMNS + 1) ]
    emptyColumns = pd.DataFrame( columns = columnsToAdd, index = np.arange(len(df.index)) )

    paddedDF = df.join(emptyColumns)
    #paddedDF.reset_index( drop = True, inplace = True )

    return paddedDF

数据帧

>>> mydata.head()

     SurveyYear  Age        Race    Gender  WeightAdjusted
0        1996   39     1.White  1.Female         1039.13
1        1996    9     1.White    2.Male          995.13
2        1996    8     1.White    2.Male          775.66
3        1996   39     1.White    2.Male          404.28
4        1996   33  3.Hispanic  1.Female          404.28

>>> groupbyKeys = ['SurveyYear', 'Age', 'Race', 'Gender']
>>> cellPopulations = mydata.groupby(groupbyKeys).agg( {'WeightAdjusted':'sum'})
>>> cellPopulations.head(20)
                                    WeightAdjusted
SurveyYear Age Race       Gender                  
1996       0   1.White    1.Female      1204859.60
                          2.Male        1227666.34
               2.Black    1.Female       307495.16
                          2.Male         263571.07
               3.Hispanic 1.Female       320359.68
                          2.Male         392902.80
               4.Asian    1.Female        78615.49
                          2.Male          82341.54
               5.Other    1.Female        16134.33
                          2.Male          19365.76
           1   1.White    1.Female      1195134.70
                          2.Male        1195659.14
               2.Black    1.Female       328376.10
                          2.Male         383293.79
               3.Hispanic 1.Female       322862.58
                          2.Male         404322.04
               4.Asian    1.Female        79499.56
                          2.Male          73783.69
               5.Other    1.Female        20647.55
                          2.Male          24222.52
>>> unstackKey  = ['SurveyYear', 'Age', 'Gender']



>>> penultimateFormatNot11Columns = cellPopulations.unstack(unstackKey)
>>> penultimateFormatNot11Columns

           WeightAdjusted                                                                                                       ...                                                                                                          
SurveyYear           1996                                                                                                       ...          1997                                                                                            
Age                    0                     1                     2                     3                     4                ...            76                  77                  78                  79                   80           
Gender           1.Female     2.Male   1.Female     2.Male   1.Female     2.Male   1.Female     2.Male   1.Female     2.Male    ...      1.Female    2.Male  1.Female    2.Male  1.Female    2.Male  1.Female    2.Male   1.Female     2.Male
Race                                                                                                                            ...                                                                                                          
1.White        1204859.60 1227666.34 1195134.70 1195659.14 1197386.21 1288700.89 1251324.65 1307458.14 1236790.33 1374989.75    ...     764103.31 506844.04 702775.64 425705.16 666705.33 423419.49 577674.82 366109.58 3898404.40 2283771.11
2.Black         307495.16  263571.07  328376.10  383293.79  291976.23  326400.85  310870.61  323344.13  301025.43  323199.08    ...      68272.99  43254.98  50082.98  34347.45  50788.70  36772.29  31393.21  20720.47  366569.11  180108.23
3.Hispanic      320359.68  392902.80  322862.58  404322.04  344564.20  340702.86  303325.95  321065.53  382663.64  311911.38    ...      39084.04  17362.56  27507.45  18803.48  17619.95  24060.91  35665.78  23802.81  174972.00  105530.84
4.Asian          78615.49   82341.54   79499.56   73783.69   96289.08   88222.32   96411.97   92029.56   77070.10   90370.15    ...      30196.58  27745.90  18419.49  15406.79   7272.27  17891.33  18116.50   3606.67   57684.54   42662.74
5.Other          16134.33   19365.76   20647.55   24222.52   17469.53   27237.94   11220.90    6996.58   23640.43   14917.77    ...       4441.26       nan   1487.90   2845.89    522.43   2453.52    303.66   2982.57   18870.12    6232.88

Tags: columnsdfagegendermalefemalewhiterace
1条回答
网友
1楼 · 发布于 2024-10-01 11:23:43

在我看来,你只需要pivot_table。在

为此,您需要在groupby()之后df.reset_index(inplace=True),然后:

df.pivot_table(values='WeightAdjusted', index='Race', columns=['SurveyYear', 'Age', 'Gender'])

相关问题 更多 >