为什么我不能从子类访问XGBClassifier功能的重要性?

2024-09-26 17:51:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在为XGBClassifier的这种古怪行为而头疼,它应该表现得像RandomForestClassifier一样好:

import xgboost as xgb 
from sklearn.ensemble import RandomForestClassifier

class my_rf(RandomForestClassifier):
    def important_features(self, X):
        return super(RandomForestClassifier, self).feature_importances_         

class my_xgb(xgb.XGBClassifier):
    def important_features(self, X):
        return super(xgb.XGBClassifier, self).feature_importances_          

c1 = my_rf()
c1.fit(X,y)
c1.important_features(X) #works

当此代码失败时:(

^{pr2}$

我盯着两个代码位看,它们看起来都一样!我错过了什么?? 抱歉,如果这是noob,python OOP的奥秘就在我之外。在

rf-code

xgb-code

编辑:

如果我使用vanilla xgb,而不继承,则一切正常:

import xgboost as xgb
print "version:", xgb.__version__
c = xgb.XGBClassifier()
c.fit(X_train.as_matrix(), y_train.label)
print c.feature_importances_[:5]            

version: 0.4
[ 0.4039548   0.05932203  0.06779661  0.00847458  0.        ]

Tags: importselfversionmyasfeaturefeaturesrf
2条回答

输出显示代码在0.4版本上,repository tree of last stable version of 0.4x(已发布Jan 15, 2016)显示{a2}文件还没有{}。 这个特性实际上是在this提交Feb 8, 2016中引入的。在

我克隆了当前的github存储库,从头构建并安装了xgboost,代码运行良好:

from sklearn import datasets
from sklearn.ensemble.forest import RandomForestClassifier
import xgboost as xgb
print "version:", xgb.__version__

class my_rf(RandomForestClassifier):
    def important_features(self, X):
        return super(RandomForestClassifier, self).feature_importances_ 

class my_xgb(xgb.XGBClassifier):
    def important_features(self, X):
        return super(xgb.XGBClassifier, self).feature_importances_

iris = datasets.load_iris()
X = iris.data
y = iris.target

c1 = my_rf()
c1.fit(X,y)
print c1.important_features(X)

c2 = my_xgb()
c2.fit(X,y)
print c2.important_features(X)

c3 = xgb.XGBClassifier()
c3.fit(X, y)
print c3.feature_importances_

输出:

^{pr2}$

编辑:

如果使用的是XGBRegressor,请确保在Dec 1, 2016之后克隆了存储库,因为根据this提交,此时{}被移动到base XGBModel进行{}访问。在

将此添加到上述代码中:

class my_xgb_regressor(xgb.XGBRegressor):
    def important_features(self, X):
        return super(xgb.XGBRegressor, self).feature_importances_

c4 = my_xgb_regressor()
c4.fit(X, y)
print c4.important_features(X)

输出:

version: 0.6
[ 0.0307026   0.01456868  0.45198349  0.50274523]
[ 0.17701453  0.11228534  0.41479525  0.29590487]
[ 0.17701453  0.11228534  0.41479525  0.29590487]
[ 0.25        0.17518248  0.34489051  0.229927  ]

据我所知,feature_importances_在XGBoost中没有实现。您可以使用排列功能重要性之类的方法进行自己的滚动:

import random
from sklearn.cross_validation import cross_val_score

def feature_importances(clf, X, y):
    score = np.mean(cross_val_score(clf, X,y,scoring='roc_auc'))
    importances = {} 
    for i in range(X.shape[1]):
        X_perm = X.copy()
        X_perm[:,i] = random.sample(X[:,i].tolist(), X.shape[0])
        perm_score = np.mean(cross_val_score(clf, X_perm , y, scoring='roc_auc'))
        importances[i] = score - perm_score

    return importances

相关问题 更多 >

    热门问题