<p>虽然jurgenreza已经解释了为什么您的程序不起作用,但解决方案仍然相当缓慢。如果您只检查子串<code>s</code>,并且知道<code>s[:-1]</code>重复出现,那么您将得到一个更快的解决方案(通常快100倍甚至更多):</p>
<pre><code>from collections import defaultdict
def pfind(prefix, sequences):
collector = defaultdict(list)
for sequence in sequences:
collector[sequence[0]].append(sequence)
for item, matching_sequences in collector.items():
if len(matching_sequences) >= 2:
new_prefix = prefix + item
yield (new_prefix, len(matching_sequences))
for r in pfind(new_prefix, [sequence[1:] for sequence in matching_sequences]):
yield r
def find_repeated_substrings(s):
s0 = s + " "
return pfind("", [s0[i:] for i in range(len(s))])
</code></pre>
<p>如果你想要口述,你可以这样称呼它:</p>
^{pr2}$
<p>在我的机器上,运行2247个元素需要0.02秒,而原始的(修正的)解决方案需要12.72秒。在</p>
<p>(请注意,这是一个相当幼稚的实现;使用索引而不是子字符串应该更快。)</p>
<p><strong>编辑:</strong>以下变量适用于其他序列类型(不仅仅是字符串)。而且,它不需要哨兵。在</p>
<pre><code>from collections import defaultdict
def pfind(s, length, ends):
collector = defaultdict(list)
if ends[-1] >= len(s):
del ends[-1]
for end in ends:
if end < len(s):
collector[s[end]].append(end)
for key, matching_ends in collector.items():
if len(matching_ends) >= 2:
end = matching_ends[0]
yield (s[end - length: end + 1], len(matching_ends))
for r in pfind(s, length + 1, [end + 1 for end in matching_ends if end < len(s)]):
yield r
def find_repeated_substrings(s):
return pfind(s, 0, list(range(len(s))))
</code></pre>
<p>这仍然存在很长的子串将超过递归深度的问题。您可能需要捕获异常。在</p>