使用fit（）函数时Scikit learn GaussianProcessClassifier内存错误问题的回答

使用fit（）函数时Scikit learn GaussianProcessClassifier内存错误

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有X轴列车和y轴列车纽比·恩达雷尺寸分别为（32561，108）和（32561，）。在 每次我调用适合我的GaussianProcessClassifier时都会收到一个内存错误。在 <pre><code>>>> import pandas as pd >>> import numpy as np >>> from sklearn.gaussian_process import GaussianProcessClassifier >>> from sklearn.gaussian_process.kernels import RBF >>> X_train.shape (32561, 108) >>> y_train.shape (32561,) >>> gp_opt = GaussianProcessClassifier(kernel=1.0 * RBF(length_scale=1.0)) >>> gp_opt.fit(X_train,y_train) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 613, in fit self.base_estimator_.fit(X, y) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 209, in fit self.kernel_.bounds)] File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 427, in _constrained_optimization fmin_l_bfgs_b(obj_func, initial_theta, bounds=bounds) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 199, in fmin_l_bfgs_b **opts) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 335, in _minimize_lbfgsb f, g = func_and_grad(x) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 285, in func_and_grad f = fun(x, *args) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 292, in function_wrapper return function(*(wrapper_args + args)) File "/home/retsim/anaconda2/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 63, in __call__ fg = self.fun(x, *args) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 201, in obj_func theta, eval_gradient=True) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gpc.py", line 338, in log_marginal_likelihood K, K_gradient = kernel(self.X_train_, eval_gradient=True) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/kernels.py", line 753, in __call__ K1, K1_gradient = self.k1(X, Y, eval_gradient=True) File "/home/retsim/.local/lib/python2.7/site-packages/sklearn/gaussian_process/kernels.py", line 1002, in __call__ K = self.constant_value * np.ones((X.shape[0], Y.shape[0])) File "/home/retsim/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 188, in ones a = empty(shape, dtype, order) MemoryError >>> </code></pre> 为什么我会遇到这个错误，我该如何修复它？在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

根据Scikit Learn<a href="http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html#sklearn.gaussian_process.GaussianProcessClassifier" rel="nofollow noreferrer">documentation</a>，估计器GaussianProcessClassifier（以及gaussianprocessregulator）有一个参数copy榍u train，默认设置为True： <blockquote> class sklearn.gaussian_process.GaussianProcessClassifier(kernel=None, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, max_iter_predict=100, warm_start=False, copy_X_train=True, random_state=None, multi_class=’one_vs_rest’, n_jobs=1) </blockquote> 参数复制列车的说明指出： <blockquote> If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally. </blockquote> 我曾试过在一台32 GB内存的PC机上，用OP提到的类似大小的训练数据集（观察值和特征）来拟合估计器。当copy_X_train设置为True时，“训练数据的持久副本”可能会占用我的RAM，导致<code>MemoryError</code>。将此参数设置为False修复了该问题。在 Scikit Learn的描述指出，基于此设置，只存储对训练数据的引用，如果外部修改数据，可能会导致预测发生变化。我对这一说法的解释是： <blockquote> Instead of storing the whole training dataset (in the form of a matrix of size nxn based on n observations) in the fitted estimator, only a reference to this dataset is stored - hence avoiding the high RAM usage. As long as the dataset stays intact externally (i.e not within the fitted estimator), it can be reliably fetched when a prediction has to be made. Modification of the dataset affects the predictions. </blockquote> 可能会有更好的解释和理论解释。在

使用fit（）函数时Scikit learn GaussianProcessClassifier内存错误

1 个回答

相关Python问题