我设法为我的两个语料库中的每个句子生成向量,并计算每个可能对之间的余弦相似性(点积):
import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
embeddings1 = ["I'd like an apple juice",
"An apple a day keeps the doctor away",
"Eat apple every day",
"We buy apples every week",
"We use machine learning for text classification",
"Text classification is subfield of machine learning"]
embeddings1 = embed(embeddings1)
embeddings2 = ["I'd like an orange juice",
"An orange a day keeps the doctor away",
"Eat orange every day",
"We buy orange every week",
"We use machine learning for document classification",
"Text classification is some subfield of machine learning"]
embeddings2 = embed(embeddings2)
print(cosine_similarity(embeddings1, embeddings2))
array([[ 0.7882168 , 0.3366559 , 0.22973989, 0.15428472, -0.10180502,
-0.04344492],
[ 0.256085 , 0.7713026 , 0.32120776, 0.17834462, -0.10769081,
-0.09398925],
[ 0.23850328, 0.446203 , 0.62606746, 0.25242645, -0.03946173,
-0.00908459],
[ 0.24337521, 0.35571027, 0.32963073, 0.6373588 , 0.08571904,
-0.01240187],
[-0.07001016, -0.12002315, -0.02002328, 0.09045915, 0.9141338 ,
0.8373743 ],
[-0.04525191, -0.09421931, -0.00631144, -0.00199519, 0.75919366,
0.9686416 ]]
为了有一个有意义的输出,我需要对它们进行排序,然后用相应的输入语句返回它们。有人知道怎么做吗?我没有找到任何关于该任务的教程
我传递了字符串,而不是字符串的lsit。问题解决了
您可以使用
np.argsort(...)
进行排序相关问题 更多 >
编程相关推荐