Pickle Tfidfvectorizer和自定义标记iz

2024-10-01 07:34:37 发布

您现在位置：Python中文网/ 问答频道 /正文

10366

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在使用服装标记器传递给TfidfVectorizer。该标记器依赖于另一个文件中的外部类TermExtractor。在

我基本上想构建一个基于特定术语的TfidVectorizer，而不是所有的单个单词/标记。在

代码如下：

from sklearn.feature_extraction.text import TfidfVectorizer
from TermExtractor import TermExtractor

extractor = TermExtractor()

def tokenize_terms(text):
    terms = extractor.extract(text)
    tokens = []
    for t in terms:
        tokens.append('_'.join(t))
    return tokens


def main(): 
    vectorizer = TfidfVectorizer(lowercase=True, min_df=2, norm='l2', smooth_idf=True, stop_words=stop_words, tokenizer=tokenize_terms)
    vectorizer.fit(corpus)
    pickle.dump(vectorizer, open("models/terms_vectorizer", "wb"))

运行正常，但每当我想重用此TfidfVectorizer并用pickle加载它时，都会收到一个错误：

^{pr2}$

当存在依赖类时Python pickle如何工作？在

Tags： text from 标记 import true def pickle stop

1条回答

网友

1楼 · 发布于 2024-10-01 07:34:37

只要弄清楚，我需要在加载pickled TfidVectorizer的代码中添加tokenize_terms（）方法，导入TermExtractor，然后创建一个提取器：

extractor = TermExtractor()

Pickle Tfidfvectorizer和自定义标记iz

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pickle Tfidfvectorizer和自定义标记iz

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >