XGBoost对list和array给出了稍微不同的预测，这是正确的吗？

print(test_feats) >> [[23.0, 3.0, 35.0, 0.28, -3.0, 18.0, 0.0, 0.0, 0.0, 3.33, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 39.0, 36.0, 113.0, 76.0, 0.0, 0.0, 1.0, 0.34, -999.0, -999.0, -999.0, -999.0, -999.0, -999.0, -999.0, -999.0, 0.0, 25.0, 48.0, 48.0, 0.0, 29.0, 52.0, 53.0, 99.0, 368.0, 676.0, 691.0, 4.0, 9.0, 12.0, 13.0]]

print(sum(test_feats[0]) == array_test_feats.sum()) print(test_feats == array_test_feats)) >> True >> array([[ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]])

1条回答

网友

1楼 · 发布于 2024-10-01 07:49:56

您刚刚遇到了这里描述的问题：https://github.com/dmlc/xgboost/pull/3970

The documentation does not include lists as an allowed type for the data inputted into DMatrix. Despite this, a list can be passed in without an error. This change would prevent a list form being passed in directly.
I experienced an issue where passing in a list vs a np.array resulted in different predictions (sometimes over 10% relative difference) for the same data. Though these differences were infrequent (~1.5% of cases tested), in certain applications this could cause serious issues.

从本质上讲，直接传递Python列表在XGBoost中是不受官方支持的，但是无论如何它都能工作，因为它在XGBoost的数据转换中命中了a fall through case

这导致XGBoost使用XGDMatrixCreateFromCSREx函数而不是XGDMatrixCreateFromMat来为数据创建underyling矩阵。然后在sprase和dense表示中缺少的元素之间有一个difference in behavior：

"Sparse" elements are treated as "missing" by the tree booster and as zeros by the linear booster.

相关问题更多 >

编程相关推荐

热门问题

热门文章