<p>根据Scikit Learn<a href="http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessClassifier.html#sklearn.gaussian_process.GaussianProcessClassifier" rel="nofollow noreferrer">documentation</a>,估计器<em>GaussianProcessClassifier</em>(以及<em>gaussianprocessregulator</em>)有一个参数<em>copy榍u train</em>,默认设置为<em>True</em>:</p>
<blockquote>
<p>class sklearn.gaussian_process.GaussianProcessClassifier(kernel=None,
optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0,
max_iter_predict=100, warm_start=False, copy_X_train=True,
random_state=None, multi_class=’one_vs_rest’, n_jobs=1)</p>
</blockquote>
<p>参数<em>复制列车</em>的说明指出:</p>
<blockquote>
<p>If True, a persistent copy of the training data is stored in the
object. Otherwise, just a reference to the training data is stored,
which might cause predictions to change if the data is modified
externally.</p>
</blockquote>
<p>我曾试过在一台32 GB内存的PC机上,用OP提到的类似大小的训练数据集(观察值和特征)来拟合估计器。当<em>copy_X_train</em>设置为<em>True</em>时,<em>“训练数据的持久副本”</em>可能会占用我的RAM,导致<code>MemoryError</code>。将此参数设置为<em>False</em>修复了该问题。在</p>
<p>Scikit Learn的描述指出,基于此设置,只存储对训练数据的引用,如果外部修改数据,</em>可能会导致预测发生变化。我对这一说法的解释是:</p>
<blockquote>
<p>Instead of storing the whole training dataset (in the form of a matrix
of size <em>nxn</em> based on <em>n</em> observations) in the fitted estimator, only
a reference to this dataset is stored - hence avoiding the high RAM
usage. As long as the dataset stays intact externally (i.e not within
the fitted estimator), it can be reliably fetched when a prediction
has to be made. Modification of the dataset affects the predictions.</p>
</blockquote>
<p>可能会有更好的解释和理论解释。在</p>