我正在使用一个Kaggle数据集,使用不同的插补技术,我从miceforest包中得到了一个我不理解的错误
完整的Colabhere,但要点是我使用的数据是我的X_train数据集中的数字特征
X = df_agg.drop(['TARGET','SK_ID_CURR'], axis = 1)
y = df_agg.TARGET
X_train_raw, X_test_raw, y_train, y_test = train_test_split(
X, y, test_size=0.10, random_state=42, stratify=y)
X_train_raw, X_dev_raw, y_train, y_dev = train_test_split(
X_train_raw, y_train,
test_size=1/9.,
random_state=42,
stratify=y_train
)
num_features = X_train_raw.select_dtypes(include=['int64', 'float64']).columns
kernel = mf.MultipleImputedKernel(
data=X_train_raw[num_features],
save_all_iterations=True,
random_state=1991
)
下面是错误:
ValueError Traceback (most recent call last)
<ipython-input-20-b6bd23e78d87> in <module>
2 data=X_train_raw[num_features],
3 save_all_iterations=True,
----> 4 random_state=1991
5 )
~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/miceforest/MultipleImputedKernel.py in __init__(self, data, datasets, variable_schema, mean_match_candidates, save_all_iterations, save_models, random_state)
56 save_all_iterations=save_all_iterations,
57 save_models=save_models,
---> 58 random_state=random_state,
59 )
60 )
~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/miceforest/KernelDataSet.py in __init__(self, data, variable_schema, mean_match_candidates, save_all_iterations, save_models, random_state)
88 mean_match_candidates=mean_match_candidates,
89 save_all_iterations=save_all_iterations,
---> 90 random_state=random_state,
91 )
92
~/anaconda3/envs/tensorflow2_latest_p37/lib/python3.7/site-packages/miceforest/ImputedDataSet.py in __init__(self, data, variable_schema, mean_match_candidates, save_all_iterations, random_state)
87 self.imputation_values[var] = {
88 0: self._random_state.choice(
---> 89 data[var].dropna(), size=self.na_counts[var]
90 )
91 }
mtrand.pyx in numpy.random.mtrand.RandomState.choice()
ValueError: 'a' cannot be empty unless no samples are taken
我是这个软件包的维护者-你能把你运行这个软件的kaggle数据链接给我吗?作为一个快速检查-当您将数据传递给此函数时,是否可以确保数据中没有100%缺少值的列?将空对象传递给numpy.random.choice()时会发生此错误,因此数据[var].dropna()必须为空
相关问题 更多 >
编程相关推荐