导出或创建插补或转换变量的数据集

2024-05-12 19:24:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在为我的数据集运行KNN,为此我必须插补缺失的值,然后转换变量,使它们可以介于0和1之间

我必须使用这个预测结果作为推断性能,并为相同的TTD模型

当我使用predict时,我可以得到预测的概率,但是我不能将这些结果转换到基本数据集中,这样就可以用来推断性能

请在下面找到示例代码-

train=pandas.read_csv("dev_in.csv")
y_train = train['Y']
w_train = train['WT']
x_train1 = train[[‘ABC’,’GEF’,’XYZ’]].replace(-1, numpy.NaN)
values = x_train1.values
imputer = Imputer()
#replacing with mean
x_train_trf = imputer.fit_transform(values) 
# count the number of NaN values in each column
print(numpy.isnan(x_train_trf).sum())
X_normalized = preprocessing.normalize(x_train_trf, norm='l2')

#similar data manipulations on test population
test=pandas.read_csv("oot_in.csv")
y_test = test['Y']
w_test = test['WT']
x_test1 = test[[‘ABC’,’GEF’,’XYZ’]].replace(-1, numpy.NaN)
print(numpy.isnan(x_test81).sum())
values_test = x_test1.values
imputer = Imputer()
#replacing with mean
x_test_trf = imputer.fit_transform(values_test) 
# count the number of NaN values in each column
print(numpy.isnan(x_test_trf).sum())
X_normalized_test = preprocessing.normalize(x_test_trf, norm='l2')

#fitting the KNN
knn = KNeighborsClassifier(n_neighbors=5, weights= 'distance', p=2)
knn.fit(X_normalized, y_train)

#checking prediction on the test population
y_pred_test = knn.predict(X_normalized_test) 
**test ['inferred'] = y_pred_test**
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-82-defc045e7eeb> in <module>()
----> 1 test ['inferred] = y_pred_test

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

在我试图创建在测试数据集中推断的变量的地方,我得到了上面的错误

你的帮助将不胜感激


Tags: csvtheintestnumpytrainnanfit