<p>我创建了<a href="https://gist.github.com/53fef94cc61d6a3e9b3eb900482f41e0" rel="noreferrer">a gist of jupyter notebook</a>来演示xgboost模型可以增量训练。我用波士顿数据集训练模型。我做了三个实验-一次学习,迭代一次学习,迭代增量学习。在增量训练中,我将波士顿的数据以50号为一批批传递给模型。</p>
<p>要点是,为了使模型收敛到通过一次性(所有数据)学习获得的精度,必须多次迭代数据。</p>
<p>下面是使用xgboost进行迭代增量学习的相应代码。</p>
<pre><code>batch_size = 50
iterations = 25
model = None
for i in range(iterations):
for start in range(0, len(x_tr), batch_size):
model = xgb.train({
'learning_rate': 0.007,
'update':'refresh',
'process_type': 'update',
'refresh_leaf': True,
#'reg_lambda': 3, # L2
'reg_alpha': 3, # L1
'silent': False,
}, dtrain=xgb.DMatrix(x_tr[start:start+batch_size], y_tr[start:start+batch_size]), xgb_model=model)
y_pr = model.predict(xgb.DMatrix(x_te))
#print(' MSE itr@{}: {}'.format(int(start/batch_size), sklearn.metrics.mean_squared_error(y_te, y_pr)))
print('MSE itr@{}: {}'.format(i, sklearn.metrics.mean_squared_error(y_te, y_pr)))
y_pr = model.predict(xgb.DMatrix(x_te))
print('MSE at the end: {}'.format(sklearn.metrics.mean_squared_error(y_te, y_pr)))
</code></pre>
<p>XGBoost版本:0.6</p>