按数据帧分组并输入缺失值

2024-04-23 04:41:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在创建一个新的计算列,但是,我想排除某些行,并计算新列的值。然后,为了填充排除行的值,我想用df1.groupby['City']对数据帧进行子集,并计算平均值以应用回原始数据帧中的new col

   df1 = pd.DataFrame({
                        'City': ['SF','NYC','SF','NYC','SF','CHI','LA','LA','CHI'], 
                        'Val': [2,4,0,0,7,4,3,5,6],
                        'Val1': [1,1,1,1,1,1,1,1,1] 
                       })

 
   df1['new col'] = np.nan

   for name, group in df.groupby(['City']):

       # Create Boolean Index
       m = group['Val'] != 0

       # Calculate rev_per_unit from grouped dataframe `group`
       stat = group.loc[m, 'Val'] / group.loc[m, 'Val1']


       # Assign stat to `df` at index locations where `Val` is 0 in group
       df.loc[group[~m].index, ''] = group['new col'].mean()

预期产出:

df1 = pd.DataFrame({
                        'City': ['SF','NYC','SF','NYC','SF','CHI','LA','LA','CHI'], 
                        'Val': [2,4,0,0,7,4,3,5,6],
                        'Val1': [1,1,1,1,1,1,1,1,1],
                        'new_col: [2,4,4.5,2,7,4,3,5,6]
                       })

新列是一个df1['Val] / df1['Val1']的计算列,不包括df['Val'] == 0,然后我将new_col作为分组平均值为df['Val'] == 0的行填充new_col


1条回答
网友
1楼 · 发布于 2024-04-23 04:41:55

第一组new_colVal/Val1

df1['new_col'] = df1['Val'] / df1['Val1']

然后使用^{}计算分组平均值(将0设置为nan忽略这些行):

means = df1['new_col'].replace(0, np.nan).groupby(df1['City']).transform('mean')

# [4.5, 4.0, 4.5, 4.0, 4.5, 5.0, 4.0, 4.0, 5.0]
# Length: 9, dtype: float64

最后,使用分组平均值对0个位置进行插补:

  • 或者使用^{}

    df1.loc[df1['Val'].eq(0), 'new_col'] = means
    
  • ^{}

    df1['new_col'] = df1['new_col'].mask(df1['Val'].eq(0), means)
    
  • ^{}

    df1['new_col'] = df1['new_col'].replace(0, np.nan).fillna(means)
    

总之:

df1['new_col'] = df1['Val'] / df1['Val1']
means = df1['new_col'].replace(0, np.nan).groupby(df1['City']).transform('mean')
df1.loc[df1['Val'].eq(0), 'new_col'] = means

#   City  Val  Val1  new_col
# 0   SF    2     1      2.0
# 1  NYC    4     1      4.0
# 2   SF    0     1      4.5
# 3  NYC    0     1      4.0
# 4   SF    7     1      7.0
# 5  CHI    4     1      4.0
# 6   LA    3     1      3.0
# 7   LA    5     1      5.0
# 8  CHI    6     1      6.0

相关问题 更多 >