<p><strong>Levenshtein距离</strong>对于<code>'ab'</code>和<code>'ac'</code>如下:</p>
<p><img src="https://i.stack.imgur.com/gCmRL.png" alt="image"/></p>
<p>所以排列是:</p>
<pre><code> a c
a b
</code></pre>
<p>对齐长度=2<br/>
不匹配数=1</p>
<p><code>Levenshtein Distance</code>是<code>1</code>,因为只需要一个替换就可以将<code>ac</code>转移到<code>ab</code>(或反向)</p>
<p>距离比=(Levenshtein距离)/(对齐长度)=0.5</p>
<p><strong>编辑</strong></p>
<p>你在写</p>
<p><code>(lensum - ldist) / lensum</code>=<code>(1 - ldist/lensum)</code>=1-0.5=0.5。</p>
<p>但这是匹配(而不是距离)<br/>
<a href="http://www.switchplane.com/blog/improving-search-with-levenshtein-distance.php" rel="noreferrer"><strong>REFFRENCE</strong></a>你可能会注意到</p>
<p><code>Matching %</code></p>
<pre><code>p = (1 - l/m) × 100
</code></pre>
<p>其中<code>l</code>是<code>levenshtein distance</code>,而<code>m</code>是<code>length of the longest of the two</code>单词:</p>
<p><sub><em>(<strong>注意</strong>:有些作者使用两者中最长的一个,我使用对齐长度)</em></sub></p>
<pre><code>(1 - 3/7) × 100 = 57.14...
(Word 1 Word 2 RATIO Mis-Match Match%
AB AB 0 0 (1 - 0/2 )*100 = 100%
CD AB 1 2 (1 - 2/2 )*100 = 0%
AB AC .5 1 (1 - 1/2 )*100 = 50%
</code></pre>
<p><sub>为什么有些作者除以对齐长度,另一个除以两者的最大长度?。。因为Levenshtein不考虑gap。距离=编辑次数(插入+删除+替换),而<a href="http://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm" rel="noreferrer">Needleman–Wunsch algorithm</a>这是标准的全局对齐方式,请考虑间隙。这是Neederman–Wunsch和Levenshtein之间的(间隙)差异,因此<strong>很多论文使用<em>两个序列之间的最大距离</em>(<strong>但这是我自己的理解,我不确定100%</strong>)</sub></p>
<p>下面是关于PAITERN分析的IEEE事务:<a href="http://www.csie.ntu.edu.tw/~b93076/Computation%20of%20Normalized%20Edit%20Distance%20and%20Applications.pdf" rel="noreferrer"><strong>Computation of Normalized Edit Distance and Applications</strong></a>在本文中,规范化编辑距离如下:</p>
<blockquote>
<p>Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d( X , Y ) is defined as the minimum of W( P ) / L ( P )w, here P is an editing path between X and Y , W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).</p>
</blockquote>