<p><strong>最终编辑</strong></p>
<p>有多种获取lambda的方法(如下所示),但是这里有两种获得lambda的简洁方法(注意,完全可复制的代码位于底部)</p>
<p>如果有<code>lambda_search = True</code>,那么可以查看<code>lambda_search</code>列下的模型摘要表,并查看为<code>lambda.min</code>设置了什么值,这是最好的lambda</p>
<pre><code>model.summary()['lambda_search']
</code></pre>
<p>它将生成一个字符串类似于:</p>
^{pr2}$
<p>如果不使用lambda搜索,也不设置lambda值(或设置它),也可以使用摘要表</p>
<pre><code>model.summary()['regularization']
</code></pre>
<p>输出如下:</p>
<pre><code>['Elastic Net (alpha = 0.5, lambda = 0.01289 )']
</code></pre>
<p><strong>其他选项:</strong></p>
<p>看看模型的实际参数:
<code>best.actual_params['lambda']</code>
<code>best.actual_params['alpha']</code></p>
<p>在网格搜索结果中,<code>best</code>是您的最佳模型</p>
<p><strong>首次编辑</strong></p>
<p>为了得到你能做的最好的模特</p>
<pre><code>grid_table = grid.get_grid(sort_by='r2', decreasing=True)
best = grid_table.models[0]
</code></pre>
<p>然后您可以使用:</p>
<pre><code>best.actual_params['lambda']
</code></pre>
<p><strong>完全可复制示例</strong></p>
<pre><code>import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()
# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()
airlines["Cancelled"] = airlines["Cancelled"].asfactor()
airlines['FlightNum'] = airlines['FlightNum'].asfactor()
# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"
# split into train and validation sets
train, valid= airlines.split_frame(ratios = [.8])
# try using the `lambda_` parameter:
# initialize your estimator
airlines_glm = H2OGeneralizedLinearEstimator(family = 'binomial', lambda_ = .0001)
# then train your model
airlines_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# print the auc for the validation data
print(airlines_glm.auc(valid=True))
# Example of values to grid over for `lambda`
# import Grid Search
from h2o.grid.grid_search import H2OGridSearch
# select the values for lambda_ to grid over
hyper_params = {'lambda': [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0]}
# this example uses cartesian grid search because the search space is small
# and we want to see the performance of all models. For a larger search space use
# random grid search instead: {'strategy': "RandomDiscrete"}
# initialize the glm estimator
airlines_glm_2 = H2OGeneralizedLinearEstimator(family = 'binomial')
# build grid search with previously made GLM and hyperparameters
grid = H2OGridSearch(model = airlines_glm_2, hyper_params = hyper_params,
search_criteria = {'strategy': "Cartesian"})
# train using the grid
grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# sort the grid models by decreasing AUC
grid_table = grid.get_grid(sort_by = 'auc', decreasing = True)
print(grid_table)
best = grid_table.models[0]
print(best.actual_params['lambda'])
</code></pre>