我试图在数据集的第3列中对有序的分类值进行编码,其中“Tiny Mongra”的值最小,“1st Wand”的值最高。它与使用小尺寸、中尺寸和大尺寸尺寸同义,当前数据集表示一粒大米的大小。在
运行此代码段时,我一直收到以下错误:
Traceback (most recent call last):
File "<ipython-input-1-ae4501cc0ac1>", line 19, in <module>
X[:, 2] = ordinalencoder_X_3.fit_transform(X[:, 2])
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 462, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 794, in fit
self._fit(X)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 61, in _fit
X = self._check_X(X)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 47, in _check_X
X_temp = check_array(X, dtype=None)
File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 552, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=['1st Wand' '1st Wand' '1st Wand' ... '1st Wand' '1st Wand' '1st Wand'].
在进一步的检查中,我发现错误并不是警告我关于分类数据的列表,而是指我想要编码的列。出于某种原因,它认为列是一个1D数组:
^{pr2}$这很奇怪,因为我使用LabelEncoder来拟合数据集中的其他分类值,它们工作得很好。在
这是一个数据链接。见“数据”表:
https://docs.google.com/spreadsheets/d/12nAU5QztVnVroRYDsRDsZGUyBpBTwAD5yMmbMaAxnHQ/edit?usp=sharing
这是完整的代码。参考最后一部分:
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Ryze Price NN Data.csv')
X = dataset.iloc[:, 1:7].values
y = dataset.iloc[:, 7].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])
# SEE THIS PART
category_array = ["Tiny Mongra","Mini Mongra","Mongra","Super Mongra","Mini Dubar","Dubar","Super Dubar","Mini Tibar","Tibar","Super Tibar","2nd Wand","Super 2nd Wand","1st Wand"]
ordinalencoder_X_3 = OrdinalEncoder(categories=category_array)
X[:, 2] = ordinalencoder_X_3.fit_transform(np.array(X[:,2])
我希望分类数据的编码如下: “小猫鼬”应编码为0 . . “第一根魔杖”应编码为12
另一个选择是使用Pandas Applymap函数,并使用Lambda函数传递映射字典,而不是使用序数编码器。在
这是映射字典:
假设下面是我的数据帧:
^{pr2}$然后可以使用以下代码创建另一个编码的映射列:
LabelEncoder
和{LabelEncoder
应该用于目标变量OrdinalEncoder
应该用于特性变量。在一般来说,它们的工作原理相同,但是:
LabelEncoder
需要y:形状[n_samples]的类数组OrdinalEncoder
需要X:类数组,形状[n个示例,n个特征]。在如果您只想将分类变量的值编码为
0, 1, ..., n
,请使用LabelEncoder
,方法与对X1和X2相同。在但我会同时用
^{pr2}$OrdinalEncoder
转换所有三个变量:相关问题 更多 >
编程相关推荐