H2O GAM加权:预测不再有效

2024-10-01 00:35:32 发布

您现在位置:Python中文网/ 问答频道 /正文

如果我训练一个加权H2O-GAM回归模型,我不能用它来预测。使用参数权重_列进行加权回归

我正在运行python=3.6.13、h2o=3.32.1.3、pandas=0.25.3、numpy=1.19.5、sklearn=0.24.2。Java版本:openjdk版本“14.0.2”

预测工作与:

  • 未加权H2O-GAM
  • 加权H2O-GLM
  • 降级至H2O=3.32.0.5时的加权H2O GAM

我已经在http://jira.h2o.ai上注册了一个bug,但是如果有人能够在不降低h2o等级的情况下让它工作,我仍然很感兴趣

import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
import h2o
from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator

h2o.no_progress()
h2o.init()

np.random.seed(42)
boston = load_boston()
y = pd.Series(boston["target"], name="y")
X = pd.DataFrame(boston["data"], columns=boston["feature_names"])  # shape: (506, 13)
myweight = pd.Series(np.random.random_sample((len(y),)), name="myweight2")

predictors = ['CRIM', 'AGE']
gam_columns = ['CRIM']

params = {
    "family": "gaussian",
    "gam_columns": gam_columns,
    'bs': len(gam_columns) * [0],
}

df0 = pd.concat([y, X, myweight], axis=1)
df = h2o.H2OFrame(python_obj=df0)

model = H2OGeneralizedAdditiveEstimator(**params)
model.train(
    x=predictors,
    y="y",
    weights_column="myweight2",
    training_frame=df,
)

print('df.shape', df.shape)
y_pred = model.predict(df)
print('y_pred:', y_pred.as_data_frame()["predict"].values[0:5])

我得到这个输出。它抱怨有关myweight2的问题:

Checking whether there is an H2O instance running at http://localhost:54321 . connected.
--------------------------  ------------------------------------------

df.shape (506, 15)
Traceback (most recent call last):
  File "/Users/g009655/tmp7/h2otest/test_gam_predict.py", line 37, in <module>
    y_pred = model.predict(df)
  File "/Users/g009655/Library/Caches/pypoetry/virtualenvs/h2otest-S7Xak4Mg-py3.6/lib/python3.6/site-packages/h2o/model/model_base.py", line 237, in predict
    j.poll()
  File "/Users/g009655/Library/Caches/pypoetry/virtualenvs/h2otest-S7Xak4Mg-py3.6/lib/python3.6/site-packages/h2o/job.py", line 80, in poll
    "\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
OSError: Job with key $03017f00000132d4ffffffff$_9242dd1b28497090cf9ccad52bd54b9f failed with an exception: java.lang.AssertionError:  null vec: $04ff0f000000ffffffff$_b0f0839f8f1a041e8bf5254b552e4dd3; 

name: myweight2

stacktrace: 
java.lang.AssertionError:  null vec: $04ff0f000000ffffffff$_b0f0839f8f1a041e8bf5254b552e4dd3; 

name: myweight2

    at water.fvec.Frame.<init>(Frame.java:161)
    at hex.gam.GAMModel.cleanUpInputFrame(GAMModel.java:505)
    at hex.gam.GAMModel.adaptTestForTrain(GAMModel.java:492)
    at hex.Model.score(Model.java:1697)
    at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:422)
    at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)
    at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
    at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
    at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
    at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
    at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Closing connection _sid_ad95 at exit
H2O session _sid_ad95 closed.

Process finished with exit code 1


Tags: columnsnameimportdfmodeljavabostonpredict
2条回答

我得到了同样的错误,但我找到了解决办法。对我来说,重新加载(在我的例子中是从pandas.DataFrame)培训H2OFrame是有效的。似乎在训练中,它不知怎么地被破坏了

在您的情况下,请尝试:

df = h2o.H2OFrame(python_obj=df0)
y_pred = model.predict(df)

谢谢你的错误报告。这是一张Jira的票,供参考

相关问题 更多 >