如何在一组排序数组中找到最大的连续重叠区域

mins = [np.min(t) for t in arrays] maxs = [np.max(t) for t in arrays] lower_bound = np.max(mins) upper_bound = np.min(maxs) lower_row = [np.searchsorted(arr, lower_bound, side='left') for arr in arrays] upper_row = [np.searchsorted(arr, upper_bound, side='right') for arr in arrays] result = zip(lower_row, upper_row)

2条回答

网友

1楼 · 编辑于 2024-10-17 06:27:55

我想您正在寻找一个解决longest common substring problem的特殊情况的方法。虽然使用后缀树或动态编程可以解决这个问题，但排序“字符串”的特殊情况更容易解决。在

我想这里的代码可以给你想要的值。它的单参数是一个排序序列的序列。它的返回值是list，其中包含每个内部序列的2元组。元组值是序列之间最长公共子串的切片索引。注意，如果没有公共的子字符串，元组都是(0,0)，这将导致空片段（我认为这是正确的，因为空片段将彼此相等！）。在

代码：

def longest_common_substring_sorted(sequences):
    l = len(sequences)
    current_indexes = [0]*l
    current_substring_length = 0
    current_substring_starts = [0]*l
    longest_substring_length = 0
    longest_substring_starts = current_substring_starts

    while all(index < len(sequence) for index, sequence
              in zip(current_indexes, sequences)):
        m = min(sequence[index] for index, sequence
                in zip(current_indexes, sequences))
        common = True
        for i in range(l):
            if sequences[i][current_indexes[i]] == m:
                current_indexes[i] += 1
            else:
                common = False

        if common:
            current_substring_length += 1
        else:
            if current_substring_length > longest_substring_length:
                longest_substring_length = current_substring_length
                longest_substring_starts = current_substring_starts
            current_substring_length = 0
            current_substring_starts = list(current_indexes)

    if current_substring_length > longest_substring_length:
        longest_substring_length = current_substring_length
        longest_substring_starts = current_substring_starts

    return [(i, i+longest_substring_length)
            for i in longest_substring_starts]

测试输出：

^{pr2}$

我很抱歉没有很好地注释代码。该算法有点类似于mergesort的merge步骤。基本上，它跟踪每个序列的索引。当它迭代时，它会增加与最小值相等的值对应的所有索引。如果所有列表中的当前值都相等（等于最小值，因此彼此相等），则它知道它位于所有列表的公共子字符串中。当子字符串结束时，将根据迄今为止找到的最长子字符串对其进行检查。在

网友

2楼 · 编辑于 2024-10-17 06:27:55

我确信有不同的方法来实现这一点，我将使用merge算法遍历这两个数组，跟踪重叠区域。如果您不熟悉这个概念，请看一下merge-sort，希望在它和代码之间可以清楚地看到它是如何工作的。在

def find_overlap(a, b):
    i = 0
    j = 0
    len_a = len(a)
    len_b = len(b)
    in_overlap = False
    best_count = 0
    best_start = (-1, -1)
    best_end = (-1, -1)

    while i < len_a and j < len_b:

        if a[i] == b[j]:
            if in_overlap:
                # Keep track of the length of the overlapping region
                count += 1
            else:
                # This is a new overlapping region, set count to 1 record start
                in_overlap = True
                count = 1
                start = (i, j)
            # Step indicies
            i += 1
            j += 1
            end = (i, j)
            if count > best_count:
                # Is this the longest overlapping region so far?
                best_count = count
                best_start = start
                best_end = end
        # If not in a an overlapping region, only step one index
        elif a[i] < b[j]:
            in_overlap = False
            i += 1
        elif b[j] < a[i]:
            in_overlap = False
            j += 1
        else:
            # This should never happen
            raise
    # End of loop

    return best_start, best_end

注意，end here在python约定中返回，因此如果a=[0, 1, 2]和{}，start=(0, 0)和{}。在

相关问题更多 >

编程相关推荐

热门问题

热门文章