我试图计算tf-idf,这是我的代码:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from nltk.corpus import stopwords
import numpy as np
import numpy.linalg as LA
train_set = ["The sky is blue.", "The sun is bright."] #Documents
test_set = ["The sun in the sky is bright sun."] #Query
stopWords = stopwords.words('english')
vectorizer = CountVectorizer(stopWords)
#print vectorizer
transformer = TfidfTransformer()
#print transformer
trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
testVectorizerArray = vectorizer.transform(test_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
print 'Transform Vectorizer to test set', testVectorizerArray
transformer.fit(trainVectorizerArray)
print
print transformer.transform(trainVectorizerArray).toarray()
transformer.fit(testVectorizerArray)
print
tfidf = transformer.transform(testVectorizerArray)
print tfidf.todense()
我得到了这个错误:
^{pr2}$我使用的是scikit版本0.14.1。在
应该是
^{pr2}$除非文档中另有说明,否则始终对scikit学习对象的构造函数参数使用关键字参数。在
相关问题 更多 >
编程相关推荐