为什么Xgboost的python版本和CLI版本的预测结果不同?

2024-06-25 23:41:45 发布

您现在位置:Python中文网/ 问答频道 /正文

最近,当我尝试使用xgboost的CLI版本来预测输入时,我发现它的结果与python版本有很大不同。在

对于python,我预测如下:

data = xgb.DMatrix(X)
bst = xgb.Booster()
bst.load_model(modelfile)
leafindex = bst.predict(data, pred_leaf=False)

并按如下方式使用CLI:

^{pr2}$

这是我的配置文件:

# General Parameters, see comment for each definition
# can be gbtree or gblinear
booster = gbtree
# choose logistic regression loss function for binary classification
objective = binary:logistic

# Tree Booster Parameters
# step size shrinkage
eta = 1.0
# minimum loss reduction required to make a further partition
gamma = 1.0
# minimum sum of instance weight(hessian) needed in a child
min_child_weight = 1
# maximum depth of a tree
max_depth = 4

# Task Parameters
# the number of round to do boosting
num_round = 150
# 0 means do not save any model except the final round model
save_period = 0
# The path of training data
data = "agaricus.txt.train"
# The path of validation data, used to monitor training process, here [test] sets name of the validation set
eval[test] = "agaricus.txt.test"
# The path of test data
test:data = "data"

Python输入数据格式:

8       201     1       2       26      10000.0 8589934592      32      0       0       1000000.0       0
2       3       1       1       50      10000.0 8589934592      32      524288  8       1000000.0       0
2       3       2       2       19      10000.0 8589934592      512     512     8       1000000.0       0
4       24      1       1       23      10000.0 8589934592      8192    0       0       1000000.0       0
1       2       2       3       50      10000.0 8589934592      32      512     8       1000000.0       0
21      1       2       3       48      10000.0 8589934592      32      512     8       1000000.0       0
5       12      1       2       42      10000.0 137438953472    32      512     8       1000000.0       0
2       11      2       2       86      10000.0 0       0       0       0       1000000.0       0
1       10      2       8       99      10000.0 8589934592      32      65536   8       1000000.0       0
2       11      2       8       97      10000.0 8589934592      32      65536   8       1000000.0       0
4       5       1       1       4       10000.0 1073741824      32      0       0       1000000.0       0
...

CLI输入格式:

0 1:8 2:201 3:1 4:2 5:26 6:10000.0 7:8589934592 8:32 9:0 10:0 11:1000000.0 12:0
0 1:2 2:3 3:1 4:1 5:50 6:10000.0 7:8589934592 8:32 9:524288 10:8 11:1000000.0 12:0
0 1:2 2:3 3:2 4:2 5:19 6:10000.0 7:8589934592 8:512 9:512 10:8 11:1000000.0 12:0
0 1:4 2:24 3:1 4:1 5:23 6:10000.0 7:8589934592 8:8192 9:0 10:0 11:1000000.0 12:0
0 1:1 2:2 3:2 4:3 5:50 6:10000.0 7:8589934592 8:32 9:512 10:8 11:1000000.0 12:0
0 1:21 2:1 3:2 4:3 5:48 6:10000.0 7:8589934592 8:32 9:512 10:8 11:1000000.0 12:0
0 1:5 2:12 3:1 4:2 5:42 6:10000.0 7:137438953472 8:32 9:512 10:8 11:1000000.0 12:0
...

python版本的结果:

0.138298
0.00288907
0.0114002
0.0477143
0.00185653
0.00455882
0.000503023
0.000817317
0.00332584
0.00178041
0.0666806
0.03003
...

CLI版本:

0.000100178
0.201246
0.449562
0.0506984
0.451953
0.389587
0.034748
0.992795
0.00348666
0.00661674
0.0186095
0.0260032
0.996163
0.259104
0.552341
0.972762
...

我使用了相同的模型文件,CLI版本的值比0.5高出40%,这与我们的预期不符。在


Tags: ofthetopathtest版本datamodel
1条回答
网友
1楼 · 发布于 2024-06-25 23:41:45

解决了!在

python和cli训练的模型文件似乎不能互相使用。 当使用由每个人自己训练的模型时,结果仍然有如下的一些差别:

by python       by cli
0.169874        0.222063
0.999997        0.999554
0.00454239      0.000879413
0.0140518       0.00824018
0.0148116       0.00859811
0.000353913     0.000880754
0.0207635       0.019058
0.000916939     0.000579058
0.00109237      0.000286653
0.00247333      0.00272115
0.0650928       0.0319875
0.946068        0.965301
0.997704        0.999615
0.987644        0.991665
0.997242        0.984403
0.948666        0.909703
0.000781899     0.00079996
0.000319449     0.000138011
0.0400793       0.164134
0.00216081      0.000781626
0.023867        0.0323994

相关问题 更多 >