为什么'gensim'中的tfidf模型在我转换语料库后会丢弃术语和计数？

2024-10-02 02:29:25 发布

您现在位置：Python中文网/ 问答频道 /正文

13472

网友

男 | 程序猿一只，喜欢编程写python代码。

为什么gensim中的tf-idf模型在我转换语料库后丢弃了术语和计数？在

我的代码：

from gensim import corpora, models, similarities

# Let's say you have a corpus made up of 2 documents.
doc0 = [(0, 1), (1, 1)]
doc1 = [(0,1)]
doc2 = [(0, 1), (1, 1)]
doc3 = [(0, 3), (1, 1)]

corpus = [doc0,doc1,doc2,doc3]

# Train a tfidf model using the corpus
tfidf = models.TfidfModel(corpus)

# Now if you print the corpus, it still remains as the flat frequency counts.
for d in corpus:
  print d
print 

# To convert the corpus into tfidf, re-initialize the corpus 
# according to the model to get the normalized frequencies.
corpus = tfidf[corpus]

for d in corpus:
  print d

输出：

^{pr2}$

Tags： the to in you for model models doc1

1条回答

网友

1楼 · 发布于 2024-10-02 02:29:25

IDF的计算方法是将文档总数除以包含该项的文档数，然后取该商的对数。在您的例子中，所有文档都有term0，因此term0的IDF是log（1），等于0。所以在doc术语矩阵中，term0的列都是零。在

一个出现在所有文档中的术语没有权重，它绝对不包含任何信息。在

为什么'gensim'中的tfidf模型在我转换语料库后会丢弃术语和计数？

相关问题更多 >

编程相关推荐

热门问题

热门文章

为什么'gensim'中的tfidf模型在我转换语料库后会丢弃术语和计数？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >