<p>这不是编码分类数据的正确方法</p>
<p><strong>要实现您想要的,您需要使用<code>sklearn.preprocessing.LabelEncoder</code>。</strong></p>
<pre><code>import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame({'a': ['unacc', 'acc', 'good', 'vgood']})
label_encoder = LabelEncoder()
label_encoder.fit(df['a'])
# Putting in encoded categories into another column `encoded`
df['encoded'] = label_encoder.transform(df['a'])
print(df)
# This prints the following `df`
# a encoded
# 0 unacc 2
# 1 acc 0
# 2 good 1
# 3 vgood 3
</code></pre>
<p>调用<code>fit</code>后的<code>label_encoder</code>包含有关将类别转换为整数的所有信息。请注意,它不会转换未看到的内容,例如,如果我执行:</p>
<pre><code>label_encoder.transform(['a', 'b'])
</code></pre>
<p>如果在调用<code>fit</code>期间既没有遇到“a”也没有遇到“b”,则将导致异常。
<br/><br/>
<strong>如何将整数解码回标签:</strong></p>
<pre><code># Just like `transform`, we also have `inverse_transform`.
df['decoded'] = label_encoder.inverse_transform(df['encoded'])
print(df)
# This will print something like:
# a encoded decoded
# 0 unacc 2 unacc
# 1 acc 0 acc
# 2 good 1 good
# 3 vgood 3 vgood
</code></pre>
<p>所以首先我对列“a”进行编码,并将编码值放入“encoded”列。然后为了测试<code>inverse_transform</code>,我对编码的值(在“encoded”列下的值)调用了逆变换,然后将结果放在“decoded”列中</p>
<p>列“a”和“decoded”应该相同,并且它们是相同的</p>
<p>您还可以打印调用<code>fit</code>后<code>label_encoder</code>识别的类</p>
<pre><code>print(label_encoder.classes_)
# This will print
# array(['acc', 'good', 'unacc', 'vgood'], dtype=object)
</code></pre>
<p>注意:我将来自<code>transform()</code>(返回一个<code>numpy.array</code>)的结果放在同一df中的“encoded”列中,<code>inverse_transform()</code>的结果放在“decoded”列中,只是为了证明解码值必须与初始值相同</p>
<p><a href="https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html" rel="nofollow noreferrer">LabelEncoder scikit-learn documentation</a></p>