h2o GLM网格搜索lambda valu

2条回答

网友

1楼 · 编辑于 2024-09-30 16:41:56

最终编辑

有多种获取lambda的方法（如下所示），但是这里有两种获得lambda的简洁方法（注意，完全可复制的代码位于底部）

如果有lambda_search = True，那么可以查看lambda_search列下的模型摘要表，并查看为lambda.min设置了什么值，这是最好的lambda

model.summary()['lambda_search']

它将生成一个字符串类似于：

^{pr2}$

如果不使用lambda搜索，也不设置lambda值（或设置它），也可以使用摘要表

model.summary()['regularization']

输出如下：

['Elastic Net (alpha = 0.5, lambda = 0.01289 )']

其他选项：

看看模型的实际参数： best.actual_params['lambda']best.actual_params['alpha']

在网格搜索结果中，best是您的最佳模型

首次编辑

为了得到你能做的最好的模特

grid_table = grid.get_grid(sort_by='r2', decreasing=True)
best = grid_table.models[0]

然后您可以使用：

best.actual_params['lambda']

完全可复制示例

import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()
airlines["Cancelled"] = airlines["Cancelled"].asfactor()
airlines['FlightNum'] = airlines['FlightNum'].asfactor()

# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"

# split into train and validation sets
train, valid= airlines.split_frame(ratios = [.8])

# try using the `lambda_` parameter:
# initialize your estimator
airlines_glm = H2OGeneralizedLinearEstimator(family = 'binomial', lambda_ = .0001)

# then train your model
airlines_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the auc for the validation data
print(airlines_glm.auc(valid=True))


# Example of values to grid over for `lambda`
# import Grid Search
from h2o.grid.grid_search import H2OGridSearch

# select the values for lambda_ to grid over
hyper_params = {'lambda': [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0]}

# this example uses cartesian grid search because the search space is small
# and we want to see the performance of all models. For a larger search space use
# random grid search instead: {'strategy': "RandomDiscrete"}
# initialize the glm estimator
airlines_glm_2 = H2OGeneralizedLinearEstimator(family = 'binomial')

# build grid search with previously made GLM and hyperparameters
grid = H2OGridSearch(model = airlines_glm_2, hyper_params = hyper_params,
                     search_criteria = {'strategy': "Cartesian"})

# train using the grid
grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# sort the grid models by decreasing AUC
grid_table = grid.get_grid(sort_by = 'auc', decreasing = True)
print(grid_table)

best = grid_table.models[0]
print(best.actual_params['lambda'])

网友

2楼 · 编辑于 2024-09-30 16:41:56

我不知道为什么下面的方法不起作用

best = grid_table.models[0]
best.actual_params["lambda"]
best.actual_params["alpha"]

这可能是h2o的问题，但如果将上述更改为以下内容，则至少应能够访问这些参数：

^{pr2}$

请注意，我已经将0更改为x，因为您需要根据您的错误标准注意哪个模型执行得最好，因为{}中的内容可能不会根据您的错误标准进行排序。这需要您查看grid_table，并记下模型的id，并查看模型是如何存储在grid中的

那么您至少应该能够引用lambda和{}。但是，当您在alpha上运行网格搜索并通过lambda_search属性best.actual_params["lambda"]启用对lambda的搜索时，将返回搜索到的lambda的完整列表。您仍然可以通过考虑Lauren的建议来引用它，但是我通常喜欢查看表中的所有内容，并建议关闭lambda_search并将其添加到您搜索的超参数中。在

import numpy as np
lambda_search_range = list(np.linspace(0,1,100))
h2o_data = h2o.import_file("h2o_example.svmlight")
cols = h2o_data.columns[1:]
hyper_parameters = {"alpha": [0.0, 0.01, 0.99, 1.0], 
"lambda": lambda_search_range}
grid = H2OGridSearch(H2OGeneralizedLinearEstimator(family="gamma", 
        link="log", lambda_search=False, nfolds=2, 
        intercept=True, standardize=False), hyper_params=hyper_parameters)
grid.train(y="C1", x=cols, training_frame=h2o_data)
grid_table = grid.get_grid(sort_by="r2", decreasing=True)
param_dict = grid_table.get_hyperparams_dict(grid_table.model_ids[0])

param_dict应该是一个字典，它根据您指定的错误标准包含最佳模型的alpha和lambda值。在

相关问题更多 >

编程相关推荐

热门问题

热门文章