<p>以下是两个不同的解决方案的总结,基于前面的答案,它们在我的案例中的表现方式。在</p>
<pre><code>import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Load the data with categorical features.
mushrooms = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data", header = None)
# Convert the categorical features to numeric: solution 1.
labelEncoder = LabelEncoder()
mushroomsNumeric = mushrooms.apply(labelEncoder.fit_transform)
# Convert the categorical features to numeric: solution 2.
mushroomsNumeric2 = pd.DataFrame(
pd.factorize(mushrooms.values.ravel())[0].reshape(mushrooms.shape),
mushrooms.index, mushrooms.columns)
mushroomsNumeric.head(5)
Out[35]:
0 1 2 3 4 5 6 7 8 9 ... 13 14 15 16 17 18 19 20 \
0 1 5 2 4 1 6 1 0 1 4 ... 2 7 7 0 2 1 4 2
1 0 5 2 9 1 0 1 0 0 4 ... 2 7 7 0 2 1 4 3
2 0 0 2 8 1 3 1 0 0 5 ... 2 7 7 0 2 1 4 3
3 1 5 3 8 1 6 1 0 1 5 ... 2 7 7 0 2 1 4 2
4 0 5 2 3 0 5 1 1 0 4 ... 2 7 7 0 2 1 0 3
21 22
0 3 5
1 2 1
2 2 3
3 3 5
4 0 1
[5 rows x 23 columns]
mushroomsNumeric2.head(5)
Out[36]:
0 1 2 3 4 5 6 7 8 9 ... 13 14 15 16 17 18 19 20 \
0 0 1 2 3 4 0 5 6 3 7 ... 2 9 9 0 9 10 0 7
1 8 1 2 12 4 13 5 6 14 7 ... 2 9 9 0 9 10 0 3
2 8 14 2 9 4 16 5 6 14 3 ... 2 9 9 0 9 10 0 3
3 0 1 12 9 4 0 5 6 3 3 ... 2 9 9 0 9 10 0 7
4 8 1 2 15 5 3 5 9 14 7 ... 2 9 9 0 9 10 8 3
21 22
0 2 11
1 3 15
2 3 17
3 2 11
4 13 15
[5 rows x 23 columns]
</code></pre>