合并重叠或包含在其他段中的词段

2024-10-03 15:24:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在从事一个项目,该项目涉及包含手写文本的图像的分词。我正在使用scale space technique for word segmentation进行此操作

一个问题是重叠段,如图所示:

enter image description here

我想将任意2个重叠段(或包含在其他段中的段)合并为一个(对于一条直线中的所有此类段)

下面是我尝试的代码:

def segment_cleaner(seg,dist = 20):
    
    seg : segments as list of list of tuples in the form [[(x1,y1),(x1+w,y1+h)],[(x2,y2),(x2+w,y2+h)],...]
    dist : represents the minimum distance between 2 segments, if the dist is less then merge the 2 segments
    
    if len(seg) ==2:
        return seg
    
    else:
        
        cleaned_seg = []
        #arrange segments from left to right
        sorteg_seg = sorted(seg, key=lambda x: x[0])

        #loop through segments, if there is overlapp, one segment is contained inside other or two segements 
        #are less than the dist argument apart, combine them together
        x_pointer = 0
        for i in range(len(sorteg_seg)-1):
            i_th_seg = sorteg_seg[i]
            i_plus1th_seg = sorteg_seg[i+1]

            #condition for containment 
            contained = i_th_seg[0][0] < i_plus1th_seg[0][0] and i_th_seg[0][1] < i_plus1th_seg[0][1] and i_th_seg[1][0] > i_plus1th_seg[1][0] and i_th_seg[1][1] > i_plus1th_seg[1][1]

            if contained:
                #ignore the smaller segement
                print('remove contained rect.')
                if i_th_seg[1][0]>x_pointer:
                    cleaned_seg.append(i_th_seg)
                    x_pointer = i_th_seg[1][0]

            elif i_plus1th_seg[0][0] - i_th_seg[1][0] <= dist:
                print('merge segements')
                x_min = min(i_th_seg[0][0],i_plus1th_seg[0][0])
                y_min = min(i_th_seg[0][1],i_plus1th_seg[0][1])
                x_max = max(i_th_seg[1][0],i_plus1th_seg[1][0])
                y_max = max(i_th_seg[1][1],i_plus1th_seg[1][1])

                append_seg = [(x_min,y_min),(x_max,y_max)]
                if x_max > x_pointer:
                    cleaned_seg.append(append_seg)
                    x_pointer = x_max

            else:
                if i_th_seg[1][0]>x_pointer:
                    cleaned_seg.append(i_th_seg)
                if i_plus1th_seg[1][0]>x_pointer and i ==len(sorteg_seg)-2 :
                    cleaned_seg.append(i_plus1th_seg)

        return cleaned_seg

它不能按预期工作,因为当我们循环遍历段列表时,这些段会动态更新。谢谢你在这方面的帮助

分段scp=[(75,0),(189,52)],[(126,0),(243,61)],[(347,0),(419,50)],[(419,0),(507,50)],[(507,13),(668,70)]]

分段清洗机(分段scp)

输出:[(75,0),(243,61)],[(347,0),(507,50)],[(419,0),(668,70)]]

期望输出:[(75,0)、(243,61)],[(347,0)、(668,70)]]


Tags: andtheifdistplusminmaxappend