Pandas中多值赋值的方法

2024-09-22 16:32:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用每个病人的平均值来分配所有的SkinThickness零值 在Age的一定范围内

所以我把数据框按Age分组,得到每个年龄段SkinThickness的平均值

为了将SkinThickness列中的每个值赋给根据年龄分组计算的相应平均值

ageSkinMean = df_clean.groupby("Age_Class")["SkinThickness"].mean()
>>> ageSkinMean

Age_Class
21-22 years     82.163399
23-25 years    103.171429
26-30 years     91.170254
31-38 years     80.133028
39-47 years     73.685851
48-58 years     89.130233
60+ years       40.899160
Name: Insulin, dtype: float64

现在我运行的代码太少了。。。使用iterrows()花费的时间太长

start = time.time()
for i, val in df_clean[df_clean.SkinThickness == 0].iterrows():
    if val[7] < 22:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[0]
    elif val[7] < 25:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[1]
    elif val[7] < 30:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[2]
    elif val[7] < 38:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[3]
    elif val[7] < 47:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[4]
    elif val[7] < 58:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[5]
    else:
        df_clean.loc[i, "SkinThickness"] = ageSkinMean[6]
print(time.time() - start)

我想知道是否有熊猫优化这样的代码块运行更快


Tags: 代码cleandfagetimevallocclass
1条回答
网友
1楼 · 发布于 2024-09-22 16:32:01

您可以使用pandas transform函数将SkinThickness 0值替换为平均值

    age_skin_thickness_mean = df_clean.groupby('Age_Class')['SkinThickness'].mean()

    def replace_with_mean_thickness(row):
       row['SkinThickness'] = age_skin_thickness_mean[row['Age_Class']]
       return row

    df_clean.loc[df_clean['SkinThickness'] == 0] = df_clean.loc[df_clean['SkinThickness'] == 0].transform(replace_with_mean_thickness, axis=1)

在df\u clean中SkinThickness==0的所有行现在的SkinThickness将等于其年龄组的平均值

相关问题 更多 >