我试图优化一个无监督的核PCA算法。 以下是一些背景:
Another approach, this time entirely unsupervised, is to select the kernel and hyperparameters that yield the lowest reconstruction error. However, reconstruction is not as easy as with linear PCA
。。。。在
Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.
One solution is to train a supervised regression model, with the projected instances as the training set and the original instances as the targets.
Now you can use grid search with cross-validation to find the kernel and hyperparameters that minimize this pre-image reconstruction error.
本书中提供的在不进行交叉验证的情况下执行重建的代码是:
rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.0433,fit_inverse_transform=True)
X_reduced = rbf_pca.fit_transform(X)
X_preimage = rbf_pca.inverse_transform(X_reduced)
>>> from sklearn.metrics import mean_squared_error
>>> mean_squared_error(X, X_preimage)
32.786308795766132
我的问题是,如何实现交叉验证来优化内核和超参数,以最小化预图像重建误差?在
到目前为止,我的努力是:
^{pr2}$谢谢你
GridSearchCV
能够进行无监督学习的交叉验证(没有y
),可以看出here in documentation:因此,唯一需要处理的是
scoring
将如何完成。在在GridSearchCV中将发生以下情况:
数据
X
将根据cv
param对于您在}。
param_grid
中指定的每个参数组合,将在上面步骤中的train
部分上训练模型,然后在test
部分使用{每个参数组合的
scores
将为所有折叠合并并取平均值。将选择性能最高的参数组合。现在棘手的部分是2。默认情况下,如果您在其中提供一个
'string'
,它将在内部转换为make_scorer
对象。对于'mean_squared_error'
相关的code is here:这是你不想要的,因为这需要}。在
y_true
和{另一个选择是使您的own custom scorer as discussed here具有签名
^{pr2}$(estimator, X, y)
。你的案例如下:然后在GridSearchCV中使用它,如下所示:
相关问题 更多 >
编程相关推荐