Python：在字符串中有多少相似的单词？

string1 = 'Fantini, Rauch, C.Straus, Priuli, Bertali: 'Festival Mass at the Imperial Court of Vienna, 1648' (Yorkshire Bach Choir & Baroque Soloists + Baroque Brass of London/Seymour)' string2 = 'Vinci, Leonardo {c.1690-1730}: Arias from Semiramide Riconosciuta, Didone Abbandonata, La Caduta dei Decemviri, Lo Cecato Fauzo, La Festa de Bacco, Catone in Utica. (Maria Angeles Peters sop. w.M.Carraro conducting)'

3条回答

网友

1楼 · 编辑于 2024-10-01 00:31:28

Lenvenshtein algorithm本身并不局限于比较字符，它可以比较任意对象。事实上，经典形式使用字符是一个实现细节，它们可以是任何符号或结构，可以比较为平等。在

在Python中，将字符串转换为单词列表，然后将算法应用于列表。也许其他人可以帮助您清理不需要的字符，大概是使用一些正则表达式魔术。在

网友

2楼 · 编辑于 2024-10-01 00:31:28

n = 0
words1 = set(sentence1.split())
for word in sentence2.split():
    # strip some chars here, e.g. as in [1]
    if word in words1:
        n += 1

（1：How to remove symbols from a string with Python?）

编辑：请注意，如果一个单词出现在两个句子中的任何地方，那么它将被视为两个句子的共同点-要比较位置，可以省略设置转换（只需对两个语句调用split（）），使用类似于：

^{pr2}$

网友

3楼 · 编辑于 2024-10-01 00:31:28

Regex可以很容易地为您提供以下所有单词：

import re
s1 = "Fantini, Rauch, C.Straus, Priuli, Bertali: 'Festival Mass at the Imperial Court of Vienna, 1648' (Yorkshire Bach Choir & Baroque Soloists + Baroque Brass of London/Seymour)"
s2 = "Vinci, Leonardo {c.1690-1730}: Arias from Semiramide Riconosciuta, Didone Abbandonata, La Caduta dei Decemviri, Lo Cecato Fauzo, La Festa de Bacco, Catone in Utica. (Maria Angeles Peters sop. w.M.Carraro conducting)"
s1w = re.findall('\w+', s1.lower())
s2w = re.findall('\w+', s2.lower())

collections.Counter（Python 2.7+）可以快速计算一个单词出现的次数。在

^{pr2}$

一个非常粗糙的比较可以通过set.intersection或difflib.SequenceMatcher来完成，但听起来你想实现一个处理单词的Levenshtein算法，你可以使用这两个列表。在

common = set(s1w).intersection(s2w) 
# returns set(['c'])

import difflib
common_ratio = difflib.SequenceMatcher(None, s1w, s2w).ratio()
print '%.1f%% of words common.' % (100*common_ratio)

打印：3.4% of words similar.

相关问题更多 >

编程相关推荐

热门问题

热门文章