vimpy:python中的非参数变量重要性评估
vimp的Python项目详细描述
vimpy:python中的非参数变量重要性评估
作者:brian williamson
简介
在预测建模应用中,确定特征子集在解释结果时的相对贡献通常是感兴趣的;这通常被称为可变重要性。将可变重要性视为未知底层数据生成机制的函数而不是用于拟合数据的特定预测算法是有用的。该软件包提供的函数,在给定预测算法的拟合值的情况下,计算基于偏差和方差的变量重要性的非参数估计,以及真实重要性的渐近有效置信区间。
安装
通过从终端窗口运行python pip install vimpy
,可以使用pip
安装vimpy
的稳定版本。或者,您可以在virtualenv
环境中安装。
通过直接下载此存储库,您可以安装vimpy
的当前开发版本。
问题
如果您遇到任何错误或有任何特定的功能请求,请file an issue。
示例
这个例子展示了如何在一个简单的设置中使用vimpy
模拟数据和使用一个回归函数。有关更多示例和详细说明,请参见R
小插曲(待续)。
## load required librariesimportnumpyasnpimportvimpyfromsklearn.ensembleimportGradientBoostingRegressorfromsklearn.model_selectionimportGridSearchCV## -------------------------------------------------------------## problem setup## -------------------------------------------------------------## define a function for the conditional mean of Y given Xdefcond_mean(x=None):f1=np.where(np.logical_and(-2<=x[:,0],x[:,0]<2),np.floor(x[:,0]),0)f2=np.where(x[:,1]<=0,1,0)f3=np.where(x[:,2]>0,1,0)f6=np.absolute(x[:,5]/4)**3f7=np.absolute(x[:,6]/4)**5f11=(7./3)*np.cos(x[:,10]/2)ret=f1+f2+f3+f6+f7+f11returnret## create datanp.random.seed(4747)n=100p=15s=1# importance desired for X_1x=np.zeros((n,p))foriinrange(0,x.shape[1]):x[:,i]=np.random.normal(0,2,n)y=cond_mean(x)+np.random.normal(0,1,n)## -------------------------------------------------------------## preliminary step: get regression estimators## -------------------------------------------------------------## use grid search to get optimal number of trees and learning ratentrees=np.arange(100,3500,500)lr=np.arange(.01,.5,.05)param_grid=[{'n_estimators':ntrees,'learning_rate':lr}]## set up cv objectscv_full=GridSearchCV(GradientBoostingRegressor(loss='ls',max_depth=1),param_grid=param_grid,cv=5)cv_small=GridSearchCV(GradientBoostingRegressor(loss='ls',max_depth=1),param_grid=param_grid,cv=5)## fit the full regressioncv_full.fit(x,y)full_fit=cv_full.best_estimator_.predict(x)## fit the reduced regressionx_small=np.delete(x,s,1)# delete the columns in scv_small.fit(x_small,full_fit)small_fit=cv_small.best_estimator_.predict(x_small)## -------------------------------------------------------------## get variable importance estimates## -------------------------------------------------------------## set up the vimp objectvimp=vimpy.vimp_regression(y,x,full_fit,small_fit,s)## get the naive estimatorvimp.plugin()## get the corrected estimatorvimp.update()vimp.onestep_based_estimator()## get a standard errorvimp.onestep_based_se()## get a confidence intervalvimp.get_ci()## -------------------------------------------------------------## get variable importance estimates using cross-validation## -------------------------------------------------------------full_fits=[None]*Vsmall_fits=[None]*Vforvinrange(V):cv_full.fit(x[folds==v,:],y[folds==v])full_fits[v]=cv_full.best_estimator_.predict(x[folds==v,:])x_small=np.delete(x[folds==v,:],s,1)# delete the columns in scv_small.fit(x_small,full_fits[v])small_fits[v]=cv_small.best_estimator_.predict(x_small)## set up the outcome and vimp objectys=[y[folds==v]forvinrange(V)]vimp_cv=vimpy.cv_vim(ys,x,full_fits,small_fits,V,folds,"regression",s)## get the naive estimatorvimp_cv.plugin()## get the corrected estimatorvimp_cv.update()vimp_cv.onestep_based_estimator()## get a standard errorvimp_cv.onestep_based_se()## get a confidence intervalvimp_cv.get_ci()