<p><code>sklearn</code>估计器实现一些方法,使您更容易保存估计器的相关训练属性。有些估计器自己实现<code>__getstate__</code>方法,但是其他一些,比如<code>GMM</code>只使用<a href="https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py" rel="nofollow noreferrer">base implementation</a>,它只保存对象的内部字典:</p>
<pre><code>def __getstate__(self):
try:
state = super(BaseEstimator, self).__getstate__()
except AttributeError:
state = self.__dict__.copy()
if type(self).__module__.startswith('sklearn.'):
return dict(state.items(), _sklearn_version=__version__)
else:
return state
</code></pre>
<p>将模型保存到光盘的推荐方法是使用<a href="https://docs.python.org/3/library/pickle.html" rel="nofollow noreferrer">^{<cd4>}</a>模块:</p>
^{pr2}$
<p>但是,您应该保存额外的数据,以便将来可以重新训练您的模型,否则将遭受严重后果<strong>(例如被锁定到旧版本的sklearn)</strong>。在</p>
<p>从<a href="http://scikit-learn.org/stable/modules/model_persistence.html" rel="nofollow noreferrer">documentation</a>:</p>
<blockquote>
<p>In order to rebuild a similar model with future versions of
scikit-learn, additional metadata should be saved along the pickled
model: </p>
<p>The training data, e.g. a reference to a immutable snapshot </p>
<p>The python source code used to generate the model </p>
<p>The versions of scikit-learn and its dependencies </p>
<p>The cross validation score obtained on the training data</p>
</blockquote>
<p>尤其是在cysk6>中,它保证了在cysk6>之间的耦合是不稳定的。它在过去看到了向后不兼容的变化。在</p>
<p>如果您的模型变得非常大并且加载变得很麻烦,您还可以使用更高效的<code>joblib</code>。根据文件:</p>
<blockquote>
<p>In the specific case of the scikit, it may be more interesting to use
joblib’s replacement of <code>pickle</code> (<code>joblib.dump</code> & <code>joblib.load</code>), which is
more efficient on objects that carry large numpy arrays internally as
is often the case for fitted scikit-learn estimators, but can only
pickle to the disk and not to a string:</p>
</blockquote>