使用Scikit-Learn-StandardScaler进行Keras回归，有管道和无管道

estimators = [] estimators.append(('standardise', StandardScaler())) estimators.append(('multiLayerPerceptron', KerasRegressor(build_fn=build_nn, nb_epoch=num_epochs, batch_size=10, verbose=0))) pipeline = Pipeline(estimators) log = pipeline.fit(X_train, Y_train) Y_deep = pipeline.predict(X_test)

1条回答

网友

1楼 · 发布于 2024-06-01 06:13:11

在第二种情况下，您对X_train和{}调用StandardScaler.fit_transform()。它的用法错误。在

您应该在X_train上调用fit_transform()，然后在X_test上只调用transform()。因为这就是Pipeline所做的。如文档所述，^{}将：

fit():
Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator
predict():
Apply transforms to the data, and predict with the final estimator

所以你看，它只对测试数据应用transform()，而不是{}。在

请详细说明我的观点，您的代码应该是：

scale = StandardScaler()
X_train = scale.fit_transform(X_train)

#This is the change
X_test = scale.transform(X_test)

model_np = KerasRegressor(build_fn=build_nn, nb_epoch=num_epochs, batch_size=10, verbose=0)
log = model_np.fit(X_train, Y_train)
Y_deep = model_np.predict(X_test)

对测试数据调用fit()或fit_transform()错误地将其缩放到与火车数据上使用的不同的比例。是预测变化的源泉。在

编辑：回答评论中的问题：

看，fit_transform()只是一个快捷函数，用于执行fit()，然后transform()。对于StandardScaler，fit()不返回任何内容，只需学习数据的平均值和标准差。然后transform()对数据应用学习以返回新的缩放数据。在

所以你所说的会导致以下两种情况：

场景1：错误

^{pr2}$

场景2：错误（基本等同于场景1，逆转缩放和吐出操作）

1) Divide the X into X_train, X_test
2) scale.fit_transform(X) [# You are not using the returned value, only fitting the data, so equivalent to scale.fit(X)]
3.a) X_train_scaled = scale.transform(X_train) #[Equals X_scaled_train in scenario 1]
3.b) X_test_scaled = scale.transform(X_test) #[Equals X_scaled_test in scenario 1]

您可以尝试任何一种方案，也许它会提高模型的性能。在

但是其中缺少一件非常重要的事情。当您对整个数据进行缩放，然后将它们分为train和test时，假设您知道测试（看不见的）数据，这在实际情况下是不正确的。并将给你的结果将不符合现实世界的结果。因为在现实世界中，所有的数据都是我们的训练数据。这也可能导致过度拟合，因为模型已经有一些关于测试数据的信息。在

因此，在评估机器学习模型的性能时，建议您在执行任何操作之前将测试数据放在一边。因为这是我们看不见的数据，我们对此一无所知。所以我的回答是，理想的操作路径是：

1) Divide X into X_train and X_test (same for y)
2) X_train_scaled = scale.fit_transform(X_train) [#Learn the mean and SD of train data]
3) X_test_scaled = scale.transform(X_test) [#Use the mean and SD learned in step2 to convert test data]
4) Use the X_train_scaled for training the model and X_test_scaled in evaluation.

希望你能理解。在

相关问题更多 >

编程相关推荐

热门问题

热门文章