hgboost是一个python包,用于分类和回归任务的xgboost、catboost和lightboost的超参数优化。
hgboost的Python项目详细描述
hgboost-超优化梯度提升
Star it if you like it!
hgboost
是Hyperoptimized Gradient Boosting的缩写,是一个python包,用于xgboost、catboost和{em1}$lightboost的超参数优化,并在独立的验证集上评估结果。
hgboost
可用于分类和回归任务。在
hgboost
很有趣,因为:
文档
hgboost示意图概述
在
安装环境
- 从PyPI安装hgboost(推荐)。hgboost与python3.6+兼容,可在Linux、macosx和Windows上运行。在
- 建议并创建一个新环境,如下所示:
condacreate-nenv_hgboostpython=3.6condaactivateenv_hgboost
从pypi
安装最新版本的hgboostpip install hgboost
Force安装最新版本
pip install -U hgboost
从github源安装
pip install git+https://github.com/erdogant/hgboost#egg=master
导入hgboost包
importhgboostashgboost
xgboost、catboost和lightboost的分类示例:
# Load libraryfromhgboostimporthgboost# Initializationhgb=hgboost(max_eval=10,threshold=0.5,cv=5,test_size=0.2,val_size=0.2,top_cv_evals=10,random_state=42)
# Import datadf=hgb.import_example()y=df['Survived'].valuesy=y.astype(str)y[y=='1']='survived'y[y=='0']='dead'# Preprocessing by encoding variablesdeldf['Survived']X=hgb.preprocessing(df)
# Fit catboost by hyperoptimization and cross-validationresults=hgb.catboost(X,y,pos_label='survived')# Fit lightboost by hyperoptimization and cross-validationresults=hgb.lightboost(X,y,pos_label='survived')# Fit xgboost by hyperoptimization and cross-validationresults=hgb.xgboost(X,y,pos_label='survived')# [hgboost] >Start hgboost classification..# [hgboost] >Collecting xgb_clf parameters.# [hgboost] >Number of variables in search space is [11], loss function: [auc].# [hgboost] >method: xgb_clf# [hgboost] >eval_metric: auc# [hgboost] >greater_is_better: True# [hgboost] >pos_label: True# [hgboost] >Total dataset: (891, 204) # [hgboost] >Hyperparameter optimization..# 100% |----| 500/500 [04:39<05:21, 1.33s/trial, best loss: -0.8800619834710744]# [hgboost] >Best performing [xgb_clf] model: auc=0.881198# [hgboost] >5-fold cross validation for the top 10 scoring models, Total nr. tests: 50# 100%|██████████| 10/10 [00:42<00:00, 4.27s/it]# [hgboost] >Evalute best [xgb_clf] model on independent validation dataset (179 samples, 20.00%).# [hgboost] >[auc] on independent validation dataset: -0.832# [hgboost] >Retrain [xgb_clf] on the entire dataset with the optimal parameters settings.
# Plot searched parameter space hgb.plot_params()
在
# Plot summary resultshgb.plot()
在
# Plot the best treehgb.treeplot()
在
# Plot the validation resultshgb.plot_validation()
在
# Plot the cross-validation resultshgb.plot_cv()
在
# use the learned model to make new predictions.y_pred,y_proba=hgb.predict(X)
为分类创建集成模型
fromhgboostimporthgboosthgb=hgboost(max_eval=100,threshold=0.5,cv=5,test_size=0.2,val_size=0.2,top_cv_evals=10,random_state=None,verbose=3)# Import datadf=hgb.import_example()y=df['Survived'].valuesdeldf['Survived']X=hgb.preprocessing(df,verbose=0)results=hgb.ensemble(X,y,pos_label=1)# use the predictory_pred,y_proba=hgb.predict(X)
创建回归的集成模型
fromhgboostimporthgboosthgb=hgboost(max_eval=100,threshold=0.5,cv=5,test_size=0.2,val_size=0.2,top_cv_evals=10,random_state=None,verbose=3)# Import datadf=hgb.import_example()y=df['Age'].valuesdeldf['Age']I=~np.isnan(y)X=hgb.preprocessing(df,verbose=0)X=X.loc[I,:]y=y[I]results=hgb.ensemble(X,y,methods=['xgb_reg','ctb_reg','lgb_reg'])# use the predictory_pred,y_proba=hgb.predict(X)
# Plot the ensemble classification validation resultshgb.plot_validation()
在
引文
如果这对你的研究有用,请在你的出版物中引用hgboost。以下是BibTeX条目示例:
@misc{erdogant2020hgboost,title={hgboost},author={Erdogan Taskesen},year={2020},howpublished={\url{https://github.com/erdogant/hgboost}},}
引用
^{pr21}$维护者
- Erdogan Taskesen,github:erdogant
贡献
- 欢迎投稿。在
许可证 有关详细信息,请参见LICENSE。在
咖啡
- 这件作品是我在空闲时间创作和维护的。如果你想为这件工作给我买一件Coffee,我将不胜感激。在
- 项目
标签: