如果我有两个单词列表，如何根据给定的2D相似度矩阵找出由它们组成的两个句子是否相似

1条回答

网友

1楼 · 发布于 2024-09-27 20:21:03

由于transitive属性为true，因此需要在属性为true的组合列表中进行不同的组合
例如，在您的案例中：["intelligent", "smart"],["smart", "brilliant"]这两个列表必须合并
因此，当您组合这两个列表时，您会得到["intelligent", "smart", "brilliant"]现在应该可以找到它的不同组合，您可以使用itertools.combinations_with_replacement进行此操作
在找到组合之前，您需要知道此属性适用于哪个列表，因此为此，我们可以使用返回的set.isdisjointTrue如果两个列表包含不同的元素，则返回False
现在拆分句子并删除矩阵中不存在的非关键字单词，然后压缩它们并检查是否在使用上述方法构建的新列表中

下面是我的想法（你可以改进）的演示：

import itertools
m=[["humans", "men"],["intelligent", "smart"],["smart", "brilliant"]]

a = "men are smart"
b = "humans are intelligent"

new_matrix = set()
for x in range(len(m)):
    for y in range(x+1, len(m)):

        if not set(m[x]).isdisjoint(m[y]):
            new_matrix.add(tuple(itertools.combinations_with_replacement(set(m[x])|set(m[y]), 2))) # or m[x]+m[y]

        else:
            #new_matrix.add(tuple(sorted(set(m[y]))))
            new_matrix.add(tuple(itertools.combinations(m[x], 2)))


new_matrix2 = []
for x in new_matrix:  # removing the nested tuple generated by itertools
    if type(x[0]) == tuple: 
        new_matrix2.extend((sorted(x) for x in set(x)))

    else:
        new_matrix2.append(sorted(x))

new_a = [x for x in a.split() if x in itertools.chain(*m)]  # remove the strings that are not present in the matrix
new_b = [x for x in b.split() if x in itertools.chain(*m)]

print(all(sorted(x) in new_matrix2 for x in itertools.zip_longest(new_a, new_b, fillvalue='')))

如果句子相似，上述程序将打印True

我考虑的另一种方法是使用集合并集并删除交集。但是如果您的内部列表包含更多元素，则此方法无法正常工作

new_matrix = set()
for x in range(len(m)):
    for y in range(x+1, len(m)):
        
        if not set(m[x]).isdisjoint(m[y]):
            intersection = set(m[x]) & set(m[y])
            union = set(m[x]) | set(m[y])
            union.remove(*intersection)

            new_matrix.add(tuple(sorted(union)))
        
          
        else:

            new_matrix.add(tuple(sorted(m[x])))
            new_matrix.add(tuple(sorted(m[y])))


new_a = [x for x in a.split() if x in itertools.chain(*m)]  # remove the strings that are not present in the matrix
new_b = [x for x in b.split() if x in itertools.chain(*m)]

print(all(tuple(sorted(x)) in new_matrix for x in zip(new_a, new_b)))

相关问题更多 >

编程相关推荐

热门问题

热门文章

如果我有两个单词列表，如何根据给定的2D相似度矩阵找出由它们组成的两个句子是否相似

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >