使用fit（）函数时Scikit learn GaussianProcessClassifier内存错误

>>> import pandas as pd >>> import numpy as np >>> from sklearn.gaussian_process import GaussianProcessClassifier >>> from sklearn.gaussian_process.kernels import RBF >>> X_train.shape (32561, 108) >>> y_train.shape (32561,) >>> gp_opt = GaussianProcessClassifier(kernel=1.0 * RBF(length_scale=1.0)) >>> gp_opt.fit(X_train,y_train) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 613, in fit self.base_estimator_.fit(X, y) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 209, in fit self.kernel_.bounds)] File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 427, in _constrained_optimization fmin_l_bfgs_b(obj_func, initial_theta, bounds=bounds) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 199, in fmin_l_bfgs_b **opts) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb f, g = func_and_grad(x) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad f = fun(x, *args) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper return function(*(wrapper_args + args)) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 63, in __call__ fg = self.fun(x, *args) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 201, in obj_func theta, eval_gradient=True) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 338, in log_marginal_likelihood K, K_gradient = kernel(self.X_train_, eval_gradient=True) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/kernels.py", line 753, in __call__ K1, K1_gradient = self.k1(X, Y, eval_gradient=True) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/kernels.py", line 1002, in __call__ K = self.constant_value * np.ones((X.shape[0], Y.shape[0])) File "/home/retsim/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 188, in ones a = empty(shape, dtype, order) MemoryError >>>

3条回答

网友

1楼 · 编辑于 2024-10-01 17:30:08

根据Scikit Learndocumentation，估计器GaussianProcessClassifier（以及gaussianprocessregulator）有一个参数copy榍u train，默认设置为True：

class sklearn.gaussian_process.GaussianProcessClassifier(kernel=None, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, max_iter_predict=100, warm_start=False, copy_X_train=True, random_state=None, multi_class=’one_vs_rest’, n_jobs=1)

参数复制列车的说明指出：

If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.

我曾试过在一台32 GB内存的PC机上，用OP提到的类似大小的训练数据集（观察值和特征）来拟合估计器。当copy_X_train设置为True时，“训练数据的持久副本”可能会占用我的RAM，导致MemoryError。将此参数设置为False修复了该问题。在

Scikit Learn的描述指出，基于此设置，只存储对训练数据的引用，如果外部修改数据，可能会导致预测发生变化。我对这一说法的解释是：

Instead of storing the whole training dataset (in the form of a matrix of size nxn based on n observations) in the fitted estimator, only a reference to this dataset is stored - hence avoiding the high RAM usage. As long as the dataset stays intact externally (i.e not within the fitted estimator), it can be reliably fetched when a prediction has to be made. Modification of the dataset affects the predictions.

可能会有更好的解释和理论解释。在

网友

2楼 · 编辑于 2024-10-01 17:30:08

在400 of ^{}行，即您所使用的分类器的实现，有一个创建的矩阵，其大小为(N, N)，其中{}是观察数。所以代码试图创建一个形状为(32561, 32561)的矩阵。这显然会引起一些问题，因为这个矩阵有超过十亿个元素。在

至于为什么要这样做，我真的不知道scikit learn的实现，但一般来说，高斯过程需要估计整个输入空间的协方差矩阵，这就是为什么如果你有高维数据，它们就不那么好了。（医生说“高维”比几十个更重要。）

对于如何修复它，我唯一的建议是分批工作。Scikit learn可能有一些实用程序可以为您生成批处理，也可以手动执行。在

网友

3楼 · 编辑于 2024-10-01 17:30:08

看看数据维度缩减技术，比如主成分分析。这将减少你的特征和你的输入矩阵的大小

相关问题更多 >

编程相关推荐

热门问题

热门文章