如何修复sklearn/python中的“ValueError:Expected 2D array，got 1D array”呢？

import sklearn import numpy as np #Importing a local data set from the desktop import pandas as pd mydata = pd.read_csv('file_format.csv',skipinitialspace=True) print mydata x_train = mydata.script y_train = mydata.label #print x_train #print y_train x_test = mydata.script from sklearn import tree classi = tree.DecisionTreeClassifier() classi.fit(x_train, y_train) predictions = classi.predict(x_test) print predictions

script class div label 0 5 6 7 html 1 0 0 0 python 2 1 1 1 csv Traceback (most recent call last): File "newtest.py", line 21, in <module> classi.fit(x_train, y_train) File "/home/initiouser2/.local/lib/python2.7/site- packages/sklearn/tree/tree.py", line 790, in fit X_idx_sorted=X_idx_sorted) File "/home/initiouser2/.local/lib/python2.7/site- packages/sklearn/tree/tree.py", line 116, in fit X = check_array(X, dtype=DTYPE, accept_sparse="csc") File "/home/initiouser2/.local/lib/python2.7/site- packages/sklearn/utils/validation.py", line 410, in check_array "if it contains a single sample.".format(array)) ValueError: Expected 2D array, got 1D array instead: array=[ 5. 0. 1.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

2条回答

网友

1楼 · 编辑于 2024-09-25 00:32:37

X=dataset.iloc[:, 0].values
y=dataset.iloc[:, 1].values

regressor=LinearRegression()
X=X.reshape(-1,1)
regressor.fit(X,y)

我有以下代码。整形运算符不是内置运算符。因此，我们必须将它的值替换为像上面给出的那样重新整形后的值。

网友

2楼 · 编辑于 2024-09-25 00:32:37

将输入传递给分类器时，传递2D数组（属于形状(M, N)，其中N>；=1），而不是1D数组（具有形状(N,)）。错误信息很清楚

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

from sklearn.model_selection import train_test_split

# X.shape should be (N, M) where M >= 1
X = mydata[['script']]  
# y.shape should be (N, 1)
y = mydata['label'] 
# perform label encoding if "label" contains strings
# y = pd.factorize(mydata['label'])[0].reshape(-1, 1) 
X_train, X_test, y_train, y_test = train_test_split(
                      X, y, test_size=0.33, random_state=42)
...

clf.fit(X_train, y_train) 
print(clf.score(X_test, y_test))

其他一些有用的提示-

将数据分成有效的训练和测试部分。不要使用你的训练数据来测试-这会导致对分类器强度的不准确估计
我建议你把你的标签分解，所以你要处理整数。只是比较容易。

相关问题更多 >

编程相关推荐

热门问题

热门文章