我如何从句子嵌入中排序向量，并给出它们各自的输入？

import tensorflow_hub as hub from sklearn.metrics.pairwise import cosine_similarity embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4") embeddings1 = ["I'd like an apple juice", "An apple a day keeps the doctor away", "Eat apple every day", "We buy apples every week", "We use machine learning for text classification", "Text classification is subfield of machine learning"] embeddings1 = embed(embeddings1) embeddings2 = ["I'd like an orange juice", "An orange a day keeps the doctor away", "Eat orange every day", "We buy orange every week", "We use machine learning for document classification", "Text classification is some subfield of machine learning"] embeddings2 = embed(embeddings2) print(cosine_similarity(embeddings1, embeddings2)) array([[ 0.7882168 , 0.3366559 , 0.22973989, 0.15428472, -0.10180502, -0.04344492], [ 0.256085 , 0.7713026 , 0.32120776, 0.17834462, -0.10769081, -0.09398925], [ 0.23850328, 0.446203 , 0.62606746, 0.25242645, -0.03946173, -0.00908459], [ 0.24337521, 0.35571027, 0.32963073, 0.6373588 , 0.08571904, -0.01240187], [-0.07001016, -0.12002315, -0.02002328, 0.09045915, 0.9141338 , 0.8373743 ], [-0.04525191, -0.09421931, -0.00631144, -0.00199519, 0.75919366, 0.9686416 ]]

2条回答

网友

1楼 · 编辑于 2024-10-06 12:31:52

我传递了字符串，而不是字符串的lsit。问题解决了

网友

2楼 · 编辑于 2024-10-06 12:31:52

您可以使用np.argsort(...)进行排序

import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

seq1 = ["I'd like an apple juice",
                                "An apple a day keeps the doctor away",
                                 "Eat apple every day",
                                 "We buy apples every week",
                                 "We use machine learning for text classification",
                                 "Text classification is subfield of machine learning"]
embeddings1 = embed(seq1)

seq2 = ["I'd like an orange juice",
                                "An orange a day keeps the doctor away",
                                 "Eat orange every day",
                                 "We buy orange every week",
                                 "We use machine learning for document classification",
                                 "Text classification is some subfield of machine learning"]
embeddings2 = embed(seq2)

a = cosine_similarity(embeddings1, embeddings2)

def get_pairs(a, b):

 a = np.array(a)
 b = np.array(b)

 c = np.array(np.meshgrid(a, b))
 c = c.T.reshape(len(a), -1, 2)

 return c

pairs = get_pairs(seq1, seq2)

sorted_idx = np.argsort(a, axis=0)[..., None]

sorted_pairs = pairs[sorted_idx]


print(pairs[0, 0])
print(pairs[0, 1])
print(pairs[0, 2])

["I'd like an apple juice" "I'd like an orange juice"]
["I'd like an apple juice" 'An orange a day keeps the doctor away']
["I'd like an apple juice" 'Eat orange every day']

相关问题更多 >

编程相关推荐

热门问题

热门文章