ValueError:找到样本数不一致的输入变量：[29675、9574、29675]

Traceback (most recent call last): File "C:/Users/Ellen/Desktop/Python/ML_4.py", line 35, in <module> X_train, X_test, y_train, y_test = train_test_split(processed_features_train, processed_features_test, labels, test_size=1, random_state=0) File "C:\Python\Python37\lib\site- packages\sklearn\model_selection\_split.py", line 2184, in train_test_split arrays = indexable(*arrays) File "C:\Python\Python37\lib\site-packages\sklearn\utils\validation.py", line 260, in indexable check_consistent_length(*result) File "C:\Python\Python37\lib\site-packages\sklearn\utils\validation.py", line 235, in check_consistent_length " samples: %r" % [int(l) for l in lengths]) ValueError: Found input variables with inconsistent numbers of samples: [29675, 9574, 29675]

tweets_train = pd.read_csv('Final.csv') features_train = tweets_train.iloc[:, 1].values labels= tweets_train.iloc[:, 0].values vectorizer = CountVectorizer(stop_words=stopwords.words('english')) processed_features_train = vectorizer.fit_transform(features_train).toarray() tweets_test = pd.read_csv('DataF1.csv') features_test= tweets_test.iloc[:, 1].values.astype('U') vectorizer = CountVectorizer(stop_words=stopwords.words('english')) processed_features_test = vectorizer.fit_transform(features_test).toarray() X_train, X_test, y_train, y_test = train_test_split(processed_features_train, processed_features_test, labels, test_size=1, random_state=0) text_classifier = RandomForestClassifier(n_estimators=200, random_state=0) #regr.fit(X_train, y_train) text_classifier.fit(X_train, y_train) predictions = text_classifier.predict(X_test) print(confusion_matrix(y_test,predictions)) print(classification_report(y_test,predictions))

neutral tap to explore the biggest change to world wars since world war neutral tap to explore the biggest change to sliced bread. negative apple blocked neutral apple applesupport can i have a yawning emoji ? i think i am asking for the 3rd or 5th time neutral apple made with 20 more child labor negative apple is not she the one who said she hates americans ?

2条回答

网友

1楼 · 编辑于 2024-04-26 10:15:27

这是因为要将三个数据集传递到train_test_split，而不是将X, y作为参数。你知道吗

网友

2楼 · 编辑于 2024-04-26 10:15:27

因为您的测试集在一个单独的文件中，所以不需要分割数据（除非您想要一个验证集，或者测试集在竞争意义上是未标记的）。
不应该在测试数据上安装新的矢量器；这样做意味着训练集和测试集中的列之间没有连接。相反，可以使用vectorizer.transform(features_test)（与vectorizer相同的对象fit_transform生成训练数据）。

所以，试试：

tweets_train = pd.read_csv('Final.csv')    
features_train = tweets_train.iloc[:, 1].values 
labels_train = tweets_train.iloc[:, 0].values
vectorizer = CountVectorizer(stop_words=stopwords.words('english'))
processed_features_train = vectorizer.fit_transform(features_train).toarray() 
tweets_test = pd.read_csv('DataF1.csv')
features_test= tweets_test.iloc[:, 1].values.astype('U')
labels_test = tweets_test.iloc[:, 0].values
processed_features_test = vectorizer.transform(features_test).toarray() 

text_classifier = RandomForestClassifier(n_estimators=200, random_state=0) 
text_classifier.fit(processed_features_train, labels_train) 
predictions = text_classifier.predict(processed_features_test)
print(confusion_matrix(labels_test,predictions))
print(classification_report(labels_test,predictions))

相关问题更多 >

编程相关推荐

热门问题

热门文章