<p>免责声明:我也是xgboost的新手,但我想我已经弄明白了。</p>
<p>试着在第一批训练后保存你的模型。然后,在连续运行时,为xgb.train方法提供保存的模型的文件路径。</p>
<p>下面是一个小实验,我试着让自己相信它是有效的:</p>
<p>首先,将波士顿数据集分为训练集和测试集。
然后把训练集分成两半。
将模型与上半场匹配,得到一个分数作为基准。
然后用下半部分拟合两个模型;其中一个模型将具有附加参数<em>xgbu model</em>。如果传入额外的参数没有什么区别,那么我们会期望它们的分数是相似的。。
不过,幸运的是,新机型的性能似乎比第一款要好得多。</p>
<pre><code>import xgboost as xgb
from sklearn.cross_validation import train_test_split as ttsplit
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error as mse
X = load_boston()['data']
y = load_boston()['target']
# split data into training and testing sets
# then split training set in half
X_train, X_test, y_train, y_test = ttsplit(X, y, test_size=0.1, random_state=0)
X_train_1, X_train_2, y_train_1, y_train_2 = ttsplit(X_train,
y_train,
test_size=0.5,
random_state=0)
xg_train_1 = xgb.DMatrix(X_train_1, label=y_train_1)
xg_train_2 = xgb.DMatrix(X_train_2, label=y_train_2)
xg_test = xgb.DMatrix(X_test, label=y_test)
params = {'objective': 'reg:linear', 'verbose': False}
model_1 = xgb.train(params, xg_train_1, 30)
model_1.save_model('model_1.model')
# ================= train two versions of the model =====================#
model_2_v1 = xgb.train(params, xg_train_2, 30)
model_2_v2 = xgb.train(params, xg_train_2, 30, xgb_model='model_1.model')
print(mse(model_1.predict(xg_test), y_test)) # benchmark
print(mse(model_2_v1.predict(xg_test), y_test)) # "before"
print(mse(model_2_v2.predict(xg_test), y_test)) # "after"
# 23.0475232194
# 39.6776876084
# 27.2053239482
</code></pre>
<p>如果有什么不清楚的地方请告诉我!</p>
<p>引用:<a href="https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/training.py" rel="noreferrer">https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/training.py</a></p>