python中用于二元logistic分类的xgb分类器中的交互项

2024-04-25 01:33:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用python中的XGB二进制logistic分类器创建了一个二叉决策树。SHAP值用于解释哪些变量对结果的影响最大。这已经开始运行了,但是我想现在将交互项添加到决策树中,看看变量的任何组合是否也有影响。有人知道我如何调整现有代码来添加交互条件吗?你知道吗

#load packages
import xgboost as xgb
from sklearn.metrics import mean_squared_error
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

#load data
df = pd.read_sql(query, con=cnxn)

#remove the dependent variable
X, y = df.iloc[:,1:],df.iloc[:,0]

#create the decision tree
data_dmatrix = xgb.DMatrix(data=X,label=y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
xg_reg = xgb.XGBClassifier(objective ='binary:logistic', colsample_bytree = 0.3, learning_rate = 0.1, max_depth = 5, alpha = 10, n_estimators = 10)

xg_reg.fit(X_train,y_train)
preds_train = xg_reg.predict_proba(X_train)
preds = xg_reg.predict_proba(X_test)

#check feature importance
sorted_idx = np.argsort(xg_reg.feature_importances_)[::-1]
for index in sorted_idx:
    print([X_train.columns[index], xg_reg.feature_importances_[index]]) 

#check feature importance through shaply values
explainerXGB = shap.TreeExplainer(xg_reg)
shap_values_XGB_test = explainerXGB.shap_values(X_test)
shap_values_XGB_train = explainerXGB.shap_values(X_train)


Tags: fromtestimportdfdataastrainsklearn