在Python中，将具有多个特性的分类数据转换为数值的最快方法是什么？

#Load the data. The features are categorical. mushrooms <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data", header = FALSE, stringsAsFactors = TRUE) #Convert the features to numeric. The features are stored in columns. mushroomsNumeric <- data.frame(lapply(mushrooms, as.numeric)) # View the first 5 samples of the original data. mushrooms[1:5,] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 1 p x s n t p f c n k e e s s w w p w o p k s u 2 e x s y t a f c b k e c s s w w p w o p n n g 3 e b s w t l f c b n e c s s w w p w o p n n m 4 p x y w t p f c n n e e s s w w p w o p k s u 5 e x s g f n f w b k t e s s w w p w o e n a g # View the first 5 samples of the converted data. mushroomsNumeric[1:5,] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 1 2 6 3 5 2 7 2 1 2 5 1 4 3 3 8 8 1 3 2 5 3 4 6 2 1 6 3 10 2 1 2 1 1 5 1 3 3 3 8 8 1 3 2 5 4 3 2 3 1 1 3 9 2 4 2 1 1 6 1 3 3 3 8 8 1 3 2 5 4 3 4 4 2 6 4 9 2 7 2 1 2 6 1 4 3 3 8 8 1 3 2 5 3 4 6 5 1 6 3 4 1 6 2 2 1 5 2 4 3 3 8 8 1 3 2 1 4 1 2

3条回答

网友

1楼 · 编辑于 2024-09-20 04:10:34

您也可以使用sklearn库中的^{}。在

from sklearn.preprocessing import LabelEncoder
lbl = LabelEncoder()

# sample data
df = pd.DataFrame({'V1': ['a','b','a','d'],
                   'V2':['c','d','d','c']})

# apply function
df.apply(lbl.fit_transform)

   V1   V2
0   0   0
1   1   1
2   0   1
3   2   0

网友

2楼 · 编辑于 2024-09-20 04:10:34

使用pd.factorize

def f(x):
    return pd.factorize(x)[0]

用于分解列

^{pr2}$

用于分解行

df.apply(f, 1)

将整个数据帧分解在一起

pd.DataFrame(
    pd.factorize(df.values.ravel())[0].reshape(df.shape),
    df.index, df.columns
)

网友

3楼 · 编辑于 2024-09-20 04:10:34

以下是两个不同的解决方案的总结，基于前面的答案，它们在我的案例中的表现方式。在

import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Load the data with categorical features.
mushrooms = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data", header = None)

# Convert the categorical features to numeric: solution 1.
labelEncoder = LabelEncoder()
mushroomsNumeric = mushrooms.apply(labelEncoder.fit_transform)

# Convert the categorical features to numeric: solution 2.
mushroomsNumeric2 = pd.DataFrame(
    pd.factorize(mushrooms.values.ravel())[0].reshape(mushrooms.shape),
    mushrooms.index, mushrooms.columns)

mushroomsNumeric.head(5)
Out[35]: 
   0   1   2   3   4   5   6   7   8   9  ...  13  14  15  16  17  18  19  20  \
0   1   5   2   4   1   6   1   0   1   4 ...   2   7   7   0   2   1   4   2   
1   0   5   2   9   1   0   1   0   0   4 ...   2   7   7   0   2   1   4   3   
2   0   0   2   8   1   3   1   0   0   5 ...   2   7   7   0   2   1   4   3   
3   1   5   3   8   1   6   1   0   1   5 ...   2   7   7   0   2   1   4   2   
4   0   5   2   3   0   5   1   1   0   4 ...   2   7   7   0   2   1   0   3   

   21  22  
0   3   5  
1   2   1  
2   2   3  
3   3   5  
4   0   1  

[5 rows x 23 columns]

mushroomsNumeric2.head(5)
Out[36]: 
   0   1   2   3   4   5   6   7   8   9  ...  13  14  15  16  17  18  19  20  \
0   0   1   2   3   4   0   5   6   3   7 ...   2   9   9   0   9  10   0   7   
1   8   1   2  12   4  13   5   6  14   7 ...   2   9   9   0   9  10   0   3   
2   8  14   2   9   4  16   5   6  14   3 ...   2   9   9   0   9  10   0   3   
3   0   1  12   9   4   0   5   6   3   3 ...   2   9   9   0   9  10   0   7   
4   8   1   2  15   5   3   5   9  14   7 ...   2   9   9   0   9  10   8   3   

   21  22  
0   2  11  
1   3  15  
2   3  17  
3   2  11  
4  13  15  

[5 rows x 23 columns]

相关问题更多 >

编程相关推荐

热门问题

热门文章