<p>我认为生成一整套可能的编辑不会显著提高速度</p>
<p>通过避免在第一次编辑后重新扩展产生相同结果的字符串,可以节省35%到50%的执行时间。这可能会在编辑距离更大的情况下产生更显著的差异。改进程度还取决于实际字符串/单词(可能不是由单个重复字母组成)</p>
<p>在任何情况下,以下是该函数的优化版本:</p>
<pre><code>import string
from collections import deque
def makeEdits(S,count=1,charSet=None):
if charSet is None: charSet = string.printable
result = set([S])
expand = deque(result)
lastEdit = False
def addEdit(s):
if s in result: return
result.add(s)
if not lastEdit: expand.append(s)
for edit in range(count):
editing = expand.copy()
expand.clear()
lastEdit = edit == count-1
for eString in editing:
for i,char in enumerate(eString):
left,right = eString[:i],eString[i+1:]
addEdit(left+right) # deletion
for newChar in charSet:
addEdit(left+newChar+char+right) # insertions before
addEdit(left+newChar+right) # replacement
for newChar in charSet:
addEdit(eString+newChar) # insertion at end
return result
</code></pre>
<p>在我的笔记本电脑上进行测试和性能测试:</p>
<pre><code>from timeit import timeit
count = 1
input_string = "a" * 10
print('makeEdits()...')
print(len(makeEdits(input_string,2)))
t = timeit(lambda:makeEdits(input_string,2),number=count)
print(t)
print('f()...')
print(len(f(input_string)))
t = timeit(lambda:f(input_string),number=count)
print(t)
</code></pre>
<p></p>
<pre><code>makeEdits()...
1631129
2.0302121050000004
f()...
1631129
3.145120027999999
</code></pre>
<p>请注意,较小的字符集(如string.ascii_字母)将显著提高性能结果(通过生成较少的字符串):</p>
<pre><code>timeit(lambda:makeEdits(input_string,2,string.ascii_letters),number=count)
# 0.48669491199999015
len(makeEdits(input_string,2,string.ascii_letters)) # 433913
timeit(lambda:makeEdits(input_string,2,string.ascii_lowercase),number=count)
# 0.10699477299999671
len(makeEdits(input_string,2,string.ascii_lowercase)) # 104805
</code></pre>
<p>如果您正在制作拼写检查器,那么在处理之前将所有字符都转换为小写,并将所有特殊字符视为包含在字符集中的单个字符是合理的。这将允许您的程序获得较小字符集的性能提升(在本例中快30倍)。您只需要添加一些字符串的预处理,也许还需要在之后对结果进行一些调整</p>