在python中比较表中的文本

3条回答

网友

1楼 · 编辑于 2024-10-01 15:34:31

Python3

正如在注释中所详述的，我们生成每一个可能的词对，创建集合以确保词的唯一性，我们只计算每一对的唯一公共词的数量。如果您的文本列表结构有点不同，可能需要对其进行一些调整

import itertools

my_list = ["a text a", "an other text b", "a last text c and so on"]

def simil(text_a, text_b):
    # returns the number of common unique words betwene two texts 
    return len(set(text_a.split()).intersection(set(text_b.split())))

results = []
# for each unique combination of texts
for pair in itertools.combinations(my_list, r=2):
    results.append(simil(*pair))

print(result)

旁注：根据您想做的事情，您可能需要查看一些算法，例如TFIDF（A simple tutorial）以获得文本/文档的相似性，或者其他许多算法。。。你知道吗

网友

2楼 · 编辑于 2024-10-01 15:34:31

您可以使用OrderedDict()的最佳方法，这对于维护提取dict keys的顺序非常有用。你知道吗

通过在dict上迭代，比较值，您将得到您的输出

网友

3楼 · 编辑于 2024-10-01 15:34:31

一种可能的方法是将每个字符串转换为一组单词，然后比较这些单词的交集

string_1 = "hello bha njik bhavd bhavd bjavd manhbd kdkndsik wkjdk"
string_2 = "bhavd dskghfski fjfbhskf ewkjhsdkifs fjuekdjsdf ue"

# First split your strings into sets of words
set_1 = set(string_1.split())
set_2 = set(string_2.split())

# Compare the sets to find where they both have the same value
print set_1 & set_2
print set_1.intersection(set_2)

# Both print out {'bhavd'}

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中比较表中的文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >