如何在获取TFIDF，cosine\u相似度后显示文档ID？python

2024-10-01 07:23:29 发布

男 | 程序猿一只，喜欢编程写python代码。

我计算查询字符串和一些文档的TF-IDF。我想计算余弦相似度并显示文档ID列表，从最相关的查询到不相关的查询。在

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
## load the documents (around 200 txt) from path
cranInp=[]
path="D:\\Desktop\\try\\web"
for file in os.listdir(path):
    textdir=path+"\\"+file
    f=open(textdir).read()
    # print f
    cranInp.append(f)


Vcount = TfidfVectorizer(analyzer='word', ngram_range=(1,1), stop_words = 'english')
countMatrix = Vcount.fit_transform(cranInp)


 Query = "in summarizing theoretical and experimental work on the behaviour of a typical aircraft structure in a noise environment is it possible to develop a design procedure ."
 queryVects  = Vcount.transform(Query)

k = 50
cosMattf = cosine_similarity(queryVects,countMatrix)

如何获取前K（K=50）文档列表，如[12.txt，34.txt，89.txt，90.txt….45.txt]列表大小为50。在

从最相关到不相关如12.txt具有最小的余弦距离，它是与查询最相关的文档。在

Tags： the path in from 文档 import txt 列表

0条回答

目前没有回答

如何在获取TFIDF，cosine\u相似度后显示文档ID？python

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在获取TFIDF，cosine\u相似度后显示文档ID？python

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >