<p><strong>XGBoost自1.3.0版以来还增加了对分类编码的实验支持</p>
<p>从<a href="https://stackoverflow.com/questions/46442266/categorical-variables-with-large-amounts-of-categories-in-xgboost-catboost">another question</a>复制我的答案</p>
<p><strong>2020年11月23日</p>
<p>XGBoost从1.3.0版起就增加了对分类功能的实验性支持。从文档中:</p>
<blockquote>
<p><strong>1.8.7 Categorical Data</strong></p>
<p>Other than users performing encoding, XGBoost has experimental support
for categorical data using <em>gpu_hist</em> and <em>gpu_predictor</em>. No special
operation needs to be done on input test data since the information
about categories is encoded into the model during training.</p>
</blockquote>
<p><a href="https://buildmedia.readthedocs.org/media/pdf/xgboost/latest/xgboost.pdf" rel="nofollow noreferrer">https://buildmedia.readthedocs.org/media/pdf/xgboost/latest/xgboost.pdf</a></p>
<p>在DMatrix部分中,文档还说:</p>
<blockquote>
<p>enable_categorical (boolean, optional) – New in version 1.3.0.</p>
<p>Experimental support of specializing for categorical features. Do not
set to True unless you are interested in development. Currently it’s
only available for gpu_hist tree method with 1 vs rest (one hot)
categorical split. Also, JSON serialization format, gpu_predictor and
pandas input are required.</p>
</blockquote>
<p><strong>其他型号选项:</strong></p>
<p>如果您不需要使用XGBoost,您可以使用像<strong>LightGBM</strong>或<strong>CatBoost</strong>这样的模型,它们支持分类编码,而无需开箱即用的热编码</p>