使用miceforest进行插补并在RandomState.choice()上获得错误

2024-09-27 00:15:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用一个Kaggle数据集,使用不同的插补技术,我从miceforest包中得到了一个我不理解的错误

完整的Colabhere,但要点是我使用的数据是我的X_train数据集中的数字特征

X = df_agg.drop(['TARGET','SK_ID_CURR'], axis = 1)
y = df_agg.TARGET

X_train_raw, X_test_raw, y_train, y_test = train_test_split(
  X, y, test_size=0.10, random_state=42, stratify=y)


X_train_raw, X_dev_raw, y_train, y_dev = train_test_split(
  X_train_raw, y_train,
  test_size=1/9.,
  random_state=42,
  stratify=y_train
)
num_features = X_train_raw.select_dtypes(include=['int64', 'float64']).columns 



kernel = mf.MultipleImputedKernel(
  data=X_train_raw[num_features],
  save_all_iterations=True,
  random_state=1991
)

下面是错误:

ValueError                                Traceback (most recent call last)
<ipython-input-20-b6bd23e78d87> in <module>
      2   data=X_train_raw[num_features],
      3   save_all_iterations=True,
----> 4   random_state=1991
      5 )

~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/miceforest/MultipleImputedKernel.py in __init__(self, data, datasets, variable_schema, mean_match_candidates, save_all_iterations, save_models, random_state)
     56                 save_all_iterations=save_all_iterations,
     57                 save_models=save_models,
---> 58                 random_state=random_state,
     59             )
     60         )

~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/miceforest/KernelDataSet.py in __init__(self, data, variable_schema, mean_match_candidates, save_all_iterations, save_models, random_state)
     88             mean_match_candidates=mean_match_candidates,
     89             save_all_iterations=save_all_iterations,
---> 90             random_state=random_state,
     91         )
     92 

~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/miceforest/ImputedDataSet.py in __init__(self, data, variable_schema, mean_match_candidates, save_all_iterations, random_state)
     87             self.imputation_values[var] = {
     88                 0: self._random_state.choice(
---> 89                     data[var].dropna(), size=self.na_counts[var]
     90                 )
     91             }

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: 'a' cannot be empty unless no samples are taken

Tags: intestselfdatarawsavematchtrain
1条回答
网友
1楼 · 发布于 2024-09-27 00:15:40

我是这个软件包的维护者-你能把你运行这个软件的kaggle数据链接给我吗?作为一个快速检查-当您将数据传递给此函数时,是否可以确保数据中没有100%缺少值的列?将空对象传递给numpy.random.choice()时会发生此错误,因此数据[var].dropna()必须为空

相关问题 更多 >

    热门问题