擅长:python、mysql、java
<p>这是另一个有<code>CountVectorizer</code>和<code>TfidfTransformer</code>的解决方案,可以找到每个单词的<code>Tfidf</code>分数:</p>
<pre><code>from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
# our corpus
data = ['I like dog', 'I love cat', 'I interested in cat']
cv = CountVectorizer()
# convert text data into term-frequency matrix
data = cv.fit_transform(data)
tfidf_transformer = TfidfTransformer()
# convert term-frequency matrix into tf-idf
tfidf_matrix = tfidf_transformer.fit_transform(data)
# create dictionary to find a tfidf word each word
word2tfidf = dict(zip(cv.get_feature_names(), tfidf_transformer.idf_))
for word, score in word2tfidf.items():
print(word, score)
</code></pre>
<hr/>
<p><strong>输出</strong>:</p>
<pre><code>(u'love', 1.6931471805599454)
(u'like', 1.6931471805599454)
(u'i', 1.0)
(u'dog', 1.6931471805599454)
(u'cat', 1.2876820724517808)
(u'interested', 1.6931471805599454)
(u'in', 1.6931471805599454)
</code></pre>