使用Pickle文件进行主题分类。Python

2024-05-02 04:45:53 发布

您现在位置：Python中文网/ 问答频道 /正文

7373

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图用我训练过的模型的pickle文件来做主题分类，但是我遇到了错误“CountVectorizer-词汇表不适合”。有人能指导我如何解决这个错误吗。你知道吗

培训数据集格式：

Topic   originalSentence 
Topic1  He has arrived with his sister's two young children.
Topic2  The Lowells have been living off the Colby fortune
Topic3  Fred and Janice Gage, who live off the Lowell  fortune, which would have gone to Alan Colby

我的培训代码：

import pandas as pd
from io import StringIO
from sklearn.feature_extraction.text import TfidfVectorizer,TfidfTransformer,CountVectorizer
from sklearn.model_selection import train_test_split
import numpy as np
import pickle

def train_model():
df = pd.read_csv('/Users/ra51646/Desktop/classification_training.csv')
df = df[pd.notnull(df['originalSentence'])]
df.columns = ['topic', 'originalSentence']
df['category_id'] = df['topic'].factorize()[0]
category_id_df = df[['topic', 'category_id']].drop_duplicates().sort_values('category_id')
category_to_id = dict(category_id_df.values)
id_to_category = dict(category_id_df[['category_id', 'topic']].values)
tfidf = TfidfVectorizer(sublinear_tf=True, min_df=5, norm='l2', encoding='latin-1', ngram_range=(1, 2), stop_words='english')
features = tfidf.fit_transform(df.originalSentence).toarray()
labels = df.category_id
X_train, X_test, y_train, y_test = train_test_split(df['originalSentence'], df['topic'], random_state = 0)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf_SGD = SGDClassifier().fit(X_train_tfidf, y_train)
clf_inc = Incremental(clf_SGD)
final_model = clf_inc.fit(X_train_tfidf, y_train,classes=np.unique(y_train))
pickle.dump(final_model, open("/Users/ra51646/Desktop/Pickle/topic_classification.pkl","wb"))

（要解决的错误）使用pickle文件进行主题分类的代码：

def find_topic1():
model = pickle.load(open("/Users/ra51646/Desktop/Pickle/topic_classification.pkl","rb"))
count_vect = CountVectorizer()
answer = model.predict(count_vect.transform(["Lindy and her family went camping in the Outback"]))
print(answer[0])
return answer

我得到错误NotFittedError: CountVectorizer - Vocabulary wasn't fitted. 在find\u topic方法中。请帮我解决这个错误。如何使用pickle文件（训练模型）进行主题分类。你知道吗

Tags： test import id df topic model 错误 transform

1条回答

网友

1楼 · 发布于 2024-05-02 04:45:53

很可能您缺少CountVectorizer的一个参数，该参数使count_vect变量从pickle模型中独立出来，从而导致错误。没有MCVE就无法确定。你知道吗

使用Pickle文件进行主题分类。Python

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Pickle文件进行主题分类。Python

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >