核PCA约简的核与超参数选择

2024-09-25 04:31:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我在读Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

我试图优化一个无监督的核PCA算法。 以下是一些背景:

Another approach, this time entirely unsupervised, is to select the kernel and hyperparameters that yield the lowest reconstruction error. However, reconstruction is not as easy as with linear PCA

。。。。在

Fortunately, it is possible to find a point in the original space that would map close to the reconstructed point. This is called the reconstruction pre-image. Once you have this pre-image, you can measure its squared distance to the original instance. You can then select the kernel and hyperparameters that minimize this reconstruction pre-image error.

One solution is to train a supervised regression model, with the projected instances as the training set and the original instances as the targets.

Now you can use grid search with cross-validation to find the kernel and hyperparameters that minimize this pre-image reconstruction error.

本书中提供的在不进行交叉验证的情况下执行重建的代码是:

rbf_pca = KernelPCA(n_components = 2, kernel="rbf", gamma=0.0433,fit_inverse_transform=True)
X_reduced = rbf_pca.fit_transform(X)
X_preimage = rbf_pca.inverse_transform(X_reduced)

>>> from sklearn.metrics import mean_squared_error
>>> mean_squared_error(X, X_preimage)
32.786308795766132

我的问题是,如何实现交叉验证来优化内核和超参数,以最小化预图像重建误差?在

到目前为止,我的努力是:

^{pr2}$

谢谢你


Tags: andthetoimagethatisaswith
1条回答
网友
1楼 · 发布于 2024-09-25 04:31:29

GridSearchCV能够进行无监督学习的交叉验证(没有y),可以看出here in documentation

fit(X, y=None, groups=None, **fit_params)

...
y : array-like, shape = [n_samples] or [n_samples, n_output], optional 
Target relative to X for classification or regression; 
None for unsupervised learning
...

因此,唯一需要处理的是scoring将如何完成。在

在GridSearchCV中将发生以下情况:

  1. 数据X将根据cvparam

  2. 对于您在param_grid中指定的每个参数组合,将在上面步骤中的train部分上训练模型,然后在test部分使用{}。

  3. 每个参数组合的scores将为所有折叠合并并取平均值。将选择性能最高的参数组合。

现在棘手的部分是2。默认情况下,如果您在其中提供一个'string',它将在内部转换为make_scorer对象。对于'mean_squared_error'相关的code is here

....
neg_mean_squared_error_scorer = make_scorer(mean_squared_error,
                                        greater_is_better=False)
....

这是你不想要的,因为这需要y_true和{}。在

另一个选择是使您的own custom scorer as discussed here具有签名(estimator, X, y)。你的案例如下:

^{pr2}$

然后在GridSearchCV中使用它,如下所示:

param_grid = [{
        "gamma": np.linspace(0.03, 0.05, 10),
        "kernel": ["rbf", "sigmoid", "linear", "poly"]
    }]

kpca=KernelPCA(fit_inverse_transform=True, n_jobs=-1) 
grid_search = GridSearchCV(kpca, param_grid, cv=3, scoring=my_scorer)
grid_search.fit(X)

相关问题 更多 >