catboost中非常相同的数据和非常相似的模型之间的特征重要性存在显著差异

# Initialize CatBoostClassifier model = CatBoostClassifier( # custom_loss=['Accuracy'], depth=9, random_seed=42, l2_leaf_reg=1, # has_time= True, iterations=300, learning_rate=0.05, loss_function='Logloss', logging_level='Verbose', ) ## Fitting catboost model model.fit( train_set.values, Y_train.values, cat_features=categorical_features_indices, eval_set=(test_set.values, Y_test) # logging_level='Verbose' # you can uncomment this for text output )

model = CatBoostClassifier( # custom_loss=['Accuracy'], depth=9, random_seed=42, l2_leaf_reg=1, # has_time= True, iterations= 'bestIteration from model1', learning_rate=0.05, loss_function='Logloss', logging_level='Verbose', ) ## Fitting catboost model model.fit( train.values, Y.values, cat_features=categorical_features_indices, # logging_level='Verbose' # you can uncomment this for text output )

Feature Score_m1 Score_m2 delta 0 x0 3.612309 2.013193 -1.399116 1 x1 3.390630 3.121273 -0.269357 2 x2 2.762750 1.822564 -0.940186 3 x3 2.553052 NaN NaN 4 x4 2.400786 0.329625 -2.071161

1条回答

网友

1楼 · 发布于 2024-09-27 09:32:20

对。一般来说，树木有些不稳定。如果删除最不重要的功能，则可以得到完全不同的模型

拥有更多的数据可以减少这种趋势

拥有更多的特征会增加这种趋势

树算法本质上是随机的，因此结果会有所不同

尝试的事项：

多次运行模型，但使用不同的随机种子。使用结果确定哪项功能似乎最不重要。（您有多少功能？）
试着平衡你的训练。这可能需要您对较罕见的案例进行采样
获取更多数据。也许你必须将你的训练和测试集结合起来，并使用保持器作为测试

相关问题更多 >

编程相关推荐

热门问题

热门文章