如果一封信重复出现，就扔掉它

3条回答

网友

1楼 · 编辑于 2024-06-26 01:41:13

下面是使用标准库中的difflib的完全不同的方法：

import difflib

words = open('/usr/share/dict/words').read().split()

difflib.get_close_matches('aaaappplllee', words, 3, 0.5)
['appalled', 'apple', 'appellate']

difflib.get_close_matches('aaardvarrk', words, 3, 0.5)
['aardvark', 'aardvarks', "aardvark's"]

网友

2楼 · 编辑于 2024-06-26 01:41:13

下面是一个解决方案，它允许您使用重复字母的不同组合迭代字符串的所有版本：

from itertools import product, groupby

# groups == ['aaaa', 'ppp', 'lll', 'ee']
groups = [''.join(g) for c, g in groupby('aaaappplllee')]

# lengths is an iterator that will return all combinations of string lengths to  
# use for each group, starting with [4, 3, 3, 2] and ending with [1, 1, 1, 1]
lengths = product(*[range(x, 0, -1) for x in map(len, groups)])

# Using the lengths from the previous line, this is a generator that yields all
# combinations of the original string with duplicate letters removed
words = (''.join(groups[i][:v] for i, v in enumerate(x)) for x in lengths)

>>> for word in words:
...   print word
... 
aaaappplllee
aaaapppllle
aaaapppllee
aaaappplle
aaaappplee
aaaappple
...
apple
aplllee
apllle
apllee
aplle
aplee
aple

这不是查找正确单词的最有效的解决方案，但它与OP最初的查找匹配方法是一致的。在

网友

3楼 · 编辑于 2024-06-26 01:41:13

如果我正确地理解了您的问题，您可以使用正则表达式执行此操作：

import re
re.sub(r'(.)\1+', r'\1', 'aardvarrk')

这会将所有相同字符的序列压缩为一个，从而得到'ardvark'。在

至于拼写检查器的实现，我建议“折叠”字典中所有按顺序具有重复字符的单词，并将其保存在字典（数据结构）中，其中键是折叠的单词，值是原始单词（或者可能是原始单词的set）：

^{pr2}$

现在，当你分析你的输入时，对于每个单词：

检查它是否存在于你的正确单词列表中。如果有，就忽略它。（例如：输入是'person'。它在单词列表中。这里没什么可做的）。
如果没有，就“折叠”它，看看：
1. 它存在于你的单词表中。如果有，请更换。（例如：'computerr'变成{}。现在只需将其替换为列表中的原始单词）。在
2. 你的字典里有一个键。如果是，则用与该键关联的单词替换它。（例如：'aaapppleee'变成{}。现在您可以在单词列表中查找'aple'。它不在那里。现在在字典中查找键'aple'。如果它在那里。将其替换为其值'apple'。）

我看到这种方法的唯一问题是两个有效的单词可能“塌陷”成同一个“单词”，这意味着您必须使用set作为值。在

假设'hallo'和{}都是有效单词，用户输入{}。现在你得决定用哪一个来代替。这可以通过计算输入和可能的替换之间的Levenshtein distance来完成。在

相关问题更多 >

编程相关推荐

热门问题

热门文章