擅长:python、mysql、java
<p>Regex可以很容易地为您提供以下所有单词:</p>
<pre><code>import re
s1 = "Fantini, Rauch, C.Straus, Priuli, Bertali: 'Festival Mass at the Imperial Court of Vienna, 1648' (Yorkshire Bach Choir & Baroque Soloists + Baroque Brass of London/Seymour)"
s2 = "Vinci, Leonardo {c.1690-1730}: Arias from Semiramide Riconosciuta, Didone Abbandonata, La Caduta dei Decemviri, Lo Cecato Fauzo, La Festa de Bacco, Catone in Utica. (Maria Angeles Peters sop. w.M.Carraro conducting)"
s1w = re.findall('\w+', s1.lower())
s2w = re.findall('\w+', s2.lower())
</code></pre>
<p><code>collections.Counter</code>(Python 2.7+)可以快速计算一个单词出现的次数。在</p>
^{pr2}$
<p>一个非常粗糙的比较可以通过<code>set.intersection</code>或<code>difflib.SequenceMatcher</code>来完成,但听起来你想实现一个处理单词的Levenshtein算法,你可以使用这两个列表。在</p>
<pre><code>common = set(s1w).intersection(s2w)
# returns set(['c'])
import difflib
common_ratio = difflib.SequenceMatcher(None, s1w, s2w).ratio()
print '%.1f%% of words common.' % (100*common_ratio)
</code></pre>
<p>打印:<code>3.4% of words similar.</code></p>