如何使用pandas将列中的每个NaN替换为不同的随机值？

import numpy as np import pandas as pd df = pd.read_csv("testfile.csv", header=None) mu, sigma = df.mean(), df.std() norm_dist = np.random.normal(mu, sigma, 1) for i in norm_dist: print df.fillna(i)

3条回答

网友

1楼 · 编辑于 2024-09-29 23:33:22

用随机值代替pandas数据帧列中的缺失值很简单。在

mean = df['column'].mean()
std = df['column'].std()

def fill_missing_from_Gaussian(column_val):
    if np.isnan(column_val) == True: 
        column_val = np.random.normal(mean, std, 1)
    else:
         column_val = column_val
return column_val

现在只需将上述方法应用于缺少值的列。在

^{pr2}$

网友

2楼 · 编辑于 2024-09-29 23:33:22

我认为你需要：

mu, sigma = df.mean(), df.std()
#get mask of NaNs
a = df[0].isnull()
#get random values by sum ot Trues, processes like 1
norm_dist = np.random.normal(mu, sigma, a.sum())
print (norm_dist)
[ 184.90581318  364.89367364  181.46335348]
#assign values by mask
df.loc[a, 0] = norm_dist
print (df)

            0
0  343.000000
1  483.000000
2  101.000000
3  184.905813
4  364.893674
5  181.463353

网友

3楼 · 编辑于 2024-09-29 23:33:22

下面是一种处理底层数组数据的方法-

def fillNaN_with_unifrand(df):
    a = df.values
    m = np.isnan(a) # mask of NaNs
    mu, sigma = df.mean(), df.std()
    a[m] = np.random.normal(mu, sigma, size=m.sum())
    return df

本质上，我们使用size param with ^{}一次性生成所有随机数，并使用nan的掩码一次性分配它们。在

样本运行-

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用pandas将列中的每个NaN替换为不同的随机值？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >