在Python中编码序数值

2024-10-01 11:35:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在数据集的第3列中对有序的分类值进行编码,其中“Tiny Mongra”的值最小,“1st Wand”的值最高。它与使用小尺寸、中尺寸和大尺寸尺寸同义,当前数据集表示一粒大米的大小。在

运行此代码段时,我一直收到以下错误:

Traceback (most recent call last):

  File "<ipython-input-1-ae4501cc0ac1>", line 19, in <module>
    X[:, 2] = ordinalencoder_X_3.fit_transform(X[:, 2])

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/base.py", line 462, in fit_transform
    return self.fit(X, **fit_params).transform(X)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 794, in fit
    self._fit(X)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 61, in _fit
    X = self._check_X(X)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py", line 47, in _check_X
    X_temp = check_array(X, dtype=None)

  File "/Users/anhad/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 552, in check_array
    "if it contains a single sample.".format(array))

ValueError: Expected 2D array, got 1D array instead:
array=['1st Wand' '1st Wand' '1st Wand' ... '1st Wand' '1st Wand' '1st Wand'].

在进一步的检查中,我发现错误并不是警告我关于分类数据的列表,而是指我想要编码的列。出于某种原因,它认为列是一个1D数组:

^{pr2}$

这很奇怪,因为我使用LabelEncoder来拟合数据集中的其他分类值,它们工作得很好。在

这是一个数据链接。见“数据”表:

https://docs.google.com/spreadsheets/d/12nAU5QztVnVroRYDsRDsZGUyBpBTwAD5yMmbMaAxnHQ/edit?usp=sharing

这是完整的代码。参考最后一部分:

import numpy as np
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Ryze Price NN Data.csv')
X = dataset.iloc[:, 1:7].values
y = dataset.iloc[:, 7].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder

labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])

labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])

# SEE THIS PART
category_array = ["Tiny Mongra","Mini Mongra","Mongra","Super Mongra","Mini Dubar","Dubar","Super Dubar","Mini Tibar","Tibar","Super Tibar","2nd Wand","Super 2nd Wand","1st Wand"]
ordinalencoder_X_3 = OrdinalEncoder(categories=category_array)
X[:, 2] = ordinalencoder_X_3.fit_transform(np.array(X[:,2])

我希望分类数据的编码如下: “小猫鼬”应编码为0 . . “第一根魔杖”应编码为12


Tags: 数据in编码liblinetransformsklearnwand
2条回答

另一个选择是使用Pandas Applymap函数,并使用Lambda函数传递映射字典,而不是使用序数编码器。在

这是映射字典:

mapping = { "Tiny Mongra" : 0,"Mini Mongra" : 1,"Mongra":2,"Super Mongra" : 3,"Mini 
Dubar":4,"Dubar":5,"Super Dubar":6,"Mini Tibar":7,"Tibar":8,"Super Tibar":9,"2nd 
Wand":10,"Super 2nd Wand" :11,"1st Wand":12}

假设下面是我的数据帧:

^{pr2}$

然后可以使用以下代码创建另一个编码的映射列:

df['mapped_category'] = df.applymap(lambda x : mapping[x])

LabelEncoder和{}之间的主要区别在于它们的目的:

  • LabelEncoder应该用于目标变量
  • OrdinalEncoder应该用于特性变量。在

一般来说,它们的工作原理相同,但是:

  • LabelEncoder需要y:形状[n_samples]的类数组
  • OrdinalEncoder需要X:类数组,形状[n个示例,n个特征]。在

如果您只想将分类变量的值编码为0, 1, ..., n,请使用LabelEncoder,方法与对X1和X2相同。在

labelencoder_X_3 = LabelEncoder()
X[:, 2] = labelencoder_X_3.fit_transform(X[:, 2])

但我会同时用OrdinalEncoder转换所有三个变量:

^{pr2}$

相关问题 更多 >