ABQACBBA
ZBABBA
Eliminate symmetric difference
ABABBA and BABBA
Start with the first A in ABABBA.
It is followed by B, giving the elements [0,1]
Check to see if AB is in BABBA, and record a match at [1,2]
So, the first pair is ((0,1), (1,2))
Next, try the first B in ABABBA.
It is followed by an A giving the elements [1,2]
Check to see if BA is in BABBA and record a match at [0,1]
Continue with the rest of the letters in ABABBA.
Then, try extensions.
The first pair AB at [0,1] and [1,2] can be extended from BA
to ABA [0,1,3] and [1,2,4]. Note, the latter group is all the
way to the right so it cannot be extended farther. ABA cannot
be extended.
Continue until all sequences have extended are far as possible.
Keep only the best of those.
我想您正在寻找的是Longest Common Subsequence算法,它可以非常快速地找到这个序列(在O(n2)时间内)。该算法基于一个简单的动态规划递归,网上有很多例子可以说明如何实现该算法。在
直观地说,该算法使用以下递归分解来工作,该分解通过查看每个序列的第一个三元组来工作:
希望这有帮助!在
生物信息学中常用的Sequence alignment算法可以在这里使用。它们主要用于对齐一个字符序列,但是可以修改它们以接受n个字符序列。Needleman–Wunsch algorithm是一个相当有效的方法。在
首先,您至少可以通过计算set symmetric difference来消除不在两个序列中出现的任何三元组来减少这个问题。在
对于最长的子序列,该算法使用dynamic programming方法。对于每一个三元组,找出长度为2的最短子串。循环这些对,尝试通过组合这些对来扩展它们。一直延伸,直到每个三元组都有最长的延伸。选择其中最长的:
相关问题 更多 >
编程相关推荐