将“column”中具有值的所有数据帧行放在一行中?

2024-05-01 18:08:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧包含如下内容:

                    #   Year    Player          PTSN    AVGN    
ThisYear                            
2018Aaron Donald    1   2018    Aaron Donald    280.60  17.538  
2018J.J. Watt       2   2018    J.J. Watt       259.80  16.238  
2018Danielle Hunter 3   2018    Danielle Hunter 237.60  14.850  
2017Aaron Donald    8   2017    Aaron Donald    181.0   12.929  
2017Danielle Hunter 20  2017    Danielle Hunter 133.2   8.325
2016Danielle Hunter 2   2016    Danielle Hunter 204.6   12.788

我想做的是调整列和行,这样我就可以对他们进行回归分析,比较每个玩家上一年和下一年的情况(如果你知道更好的方法来完成我的要求,请告诉我)。你知道吗

我想要的最终结果是这样的:

Player          PTSN     AVGN      PTSNN1      AVGNN1
Aaron Donald    280.60   17.538    181.0       12.929

我该怎么做?或者,实现我想要的结果的更好方法是什么?你知道吗


Tags: 数据方法内容玩家情况yearplayeraaron
1条回答
网友
1楼 · 发布于 2024-05-01 18:08:42

新答案:设置“当前”和“以前”列进行关联

# Same setup
df = pd.DataFrame({'#': [1, 2, 3, 8, 20, 2],
 'AVGN': [17.538, 16.238, 14.85, 12.929, 8.325, 12.788],
 'PTSN': [280.6, 259.8, 237.6, 181.0, 133.2, 204.6],
 'Player': ['Aaron Donald',
            'J.J. Watt',
            'Danielle Hunter',
            'Aaron Donald',
            'Danielle Hunter',
            'Danielle Hunter'],
 'Year': [2018, 2018, 2018, 2017, 2017, 2016]})

# Do not unstack the MultiIndex
res = df.set_index(['Player', 'Year'])[['AVGN', 'PTSN']]

# Build a MultiIndex of all players by all years
idx = pd.MultiIndex.from_product([df['Player'].unique(), 
                                  df['Year'].unique()],
                                 names=['Player', 'Year'])

# Introduce a row of NaN values for any combination of 
# player and year not in the original DataFrame
res = res.reindex(idx).sort_index()

res[['AVGN_prev', 'PTSN_prev']] = res.groupby('Player')[['AVGN', 'PTSN']].shift()

res
                        AVGN   PTSN  AVGN_prev  PTSN_prev
Player          Year                                     
Aaron Donald    2016     NaN    NaN        NaN        NaN
                2017  12.929  181.0        NaN        NaN
                2018  17.538  280.6     12.929      181.0
Danielle Hunter 2016  12.788  204.6        NaN        NaN
                2017   8.325  133.2     12.788      204.6
                2018  14.850  237.6      8.325      133.2
J.J. Watt       2016     NaN    NaN        NaN        NaN
                2017     NaN    NaN        NaN        NaN
                2018  16.238  259.8        NaN        NaN

古老的答案

将index设置为['Player', 'Year'],然后将内部级别取消堆叠为列:

# Simplified version of your example DataFrame
df = pd.DataFrame({'#': [1, 2, 3, 8, 20, 2],
 'AVGN': [17.538, 16.238, 14.85, 12.929, 8.325, 12.788],
 'PTSN': [280.6, 259.8, 237.6, 181.0, 133.2, 204.6],
 'Player': ['Aaron Donald',
            'J.J. Watt',
            'Danielle Hunter',
            'Aaron Donald',
            'Danielle Hunter',
            'Danielle Hunter'],
 'Year': [2018, 2018, 2018, 2017, 2017, 2016]})

res = df.set_index(['Player', 'Year'])[['AVGN', 'PTSN']].unstack()

res
                   AVGN                   PTSN              
Year               2016    2017    2018   2016   2017   2018
Player                                                      
Aaron Donald        NaN  12.929  17.538    NaN  181.0  280.6
Danielle Hunter  12.788   8.325  14.850  204.6  133.2  237.6
J.J. Watt           NaN     NaN  16.238    NaN    NaN  259.8

此时,列是一个多索引。要展平柱:

# Convert integer years to strings
oldcols = res.columns
res.columns = oldcols.set_levels([oldcols.levels[0],
                                  oldcols.levels[1].astype(str)])

# Flatten columns
res.columns = ['_'.join(col) for col in res.columns.values]

res
                 AVGN_2016  AVGN_2017  AVGN_2018  PTSN_2016  PTSN_2017  PTSN_2018
Player                                                                           
Aaron Donald           NaN     12.929     17.538        NaN      181.0      280.6
Danielle Hunter     12.788      8.325     14.850      204.6      133.2      237.6
J.J. Watt              NaN        NaN     16.238        NaN        NaN      259.8

相关问题 更多 >