<p>我想您正在寻找一个解决<a href="http://en.wikipedia.org/wiki/Longest_common_substring_problem" rel="nofollow">longest common substring problem</a>的特殊情况的方法。虽然使用后缀树或动态编程可以解决这个问题,但排序“字符串”的特殊情况更容易解决。在</p>
<p>我想这里的代码可以给你想要的值。它的单参数是一个排序序列的序列。它的返回值是list,其中包含每个内部序列的2元组。元组值是序列之间最长公共子串的切片索引。注意,如果没有公共的子字符串,元组都是<code>(0,0)</code>,这将导致空片段(我认为这是正确的,因为空片段将彼此相等!)。在</p>
<p>代码:</p>
<pre><code>def longest_common_substring_sorted(sequences):
l = len(sequences)
current_indexes = [0]*l
current_substring_length = 0
current_substring_starts = [0]*l
longest_substring_length = 0
longest_substring_starts = current_substring_starts
while all(index < len(sequence) for index, sequence
in zip(current_indexes, sequences)):
m = min(sequence[index] for index, sequence
in zip(current_indexes, sequences))
common = True
for i in range(l):
if sequences[i][current_indexes[i]] == m:
current_indexes[i] += 1
else:
common = False
if common:
current_substring_length += 1
else:
if current_substring_length > longest_substring_length:
longest_substring_length = current_substring_length
longest_substring_starts = current_substring_starts
current_substring_length = 0
current_substring_starts = list(current_indexes)
if current_substring_length > longest_substring_length:
longest_substring_length = current_substring_length
longest_substring_starts = current_substring_starts
return [(i, i+longest_substring_length)
for i in longest_substring_starts]
</code></pre>
<p>测试输出:</p>
^{pr2}$
<p>我很抱歉没有很好地注释代码。该算法有点类似于mergesort的merge步骤。基本上,它跟踪每个序列的索引。当它迭代时,它会增加与最小值相等的值对应的所有索引。如果所有列表中的当前值都相等(等于最小值,因此彼此相等),则它知道它位于所有列表的公共子字符串中。当子字符串结束时,将根据迄今为止找到的最长子字符串对其进行检查。在</p>