使用随机最大/最小值替换空值

2024-09-28 18:58:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我想替换随机数列“较低置信区间”中的空值(它必须是或同一列的最小值和或最大值)。我试过这个,但不起作用。我错过了什么

minimo = np.nanmin(df_copy['Lower Confidence Interval']) 
maximo = np.nanmax(df_copy['Lower Confidence Interval'])
    
import random
    
for value in df['Lower Confidence Interval']:
    if value == 0:
        value.fill_na(np.random.choice([minimo, maximo])

Tags: importdfvaluenprandomlower空值confidence
3条回答

导入模块:

import pandas as pd
import numpy as np
import random

创建示例数据

df_copy = pd.DataFrame({'Lower Confidence Interval':[0,1,np.nan,8]})
minimo = np.nanmin(df_copy['Lower Confidence Interval']) 
maximo = np.nanmax(df_copy['Lower Confidence Interval'])

计算NAN的数量

cnt=df_copy['Lower Confidence Interval'].isnull().sum() 

用最小值、最大值随机替换NAN:

df_copy[df_copy['Lower Confidence Interval'].isnull()] = np.random.choice([minimo, maximo],cnt).reshape(cnt,1)

我建议使用lambda函数的映射

minimo = np.nanmin(df_copy['Lower Confidence Interval']) 
maximo = np.nanmax(df_copy['Lower Confidence Interval'])

df_copy['Lower Confidence Interval'] = df_copy['Lower Confidence Interval'].map(
    lambda x: np.random.choice([minimo, maximo]) if np.isnan(x) else x
)

如果您对此不熟悉,基本上可以转换为以下伪代码:

for each element x of df_copy['Lower Confidence Interval']
   x = lambda(x)

function lambda(x)
   if np.isnan(x)
      return np.random.choice([minimo, maximo])
   else
      return x

错误在于,当您迭代每个值时,它们是float而不是dataframe对象,因此fill_na将无法对其工作。我使用了与您的问题相同的想法,并创建了一个pandas lambda函数:

import pandas as pd
import numpy as np
df_dict = {"Lower Confidence Interval":[1,2,2.5,6,5.5,np.nan, np.nan]}
df = pd.DataFrame.from_dict(df_dict)

输出:

df
   Lower Confidence Interval
0                        1.0
1                        2.0
2                        2.5
3                        6.0
4                        5.5
5                        NaN
6                        NaN

其余代码如下:

minimo = np.nanmin(df['Lower Confidence Interval'])
maximo = np.nanmax(df['Lower Confidence Interval'])


# Assuming you have done this. (Note: it makes it as float)
df.fillna(0, inplace=True)

def replace_na(x):
    if x == float(0):
        return np.random.choice([minimo, maximo])
    else:
        return x
df["Lower Confidence Interval"] = df["Lower Confidence Interval"].apply(replace_na)

替换列的最小值和最大值后,输出为:

df
Out[15]: 
   Lower Confidence Interval
0                        1.0
1                        2.0
2                        2.5
3                        6.0
4                        5.5
5                        1.0
6                        6.0

相关问题 更多 >