滑动窗口 - 如何获取图像上的窗口位置

2024-09-30 08:26:09 发布

您现在位置:Python中文网/ 问答频道 /正文

提到python中的这个很棒的滑动窗口实现:https://github.com/keepitsimple/ocrtest/blob/master/sliding_window.py#blob_contributors_box,我的问题是——在代码中的哪一个地方可以看到图像上当前窗口的位置? 或者我怎样才能知道它的位置?在

在第72行和第85行之后,我尝试打印出shape和{},但显然我没有得到任何进展。在norm_shape函数中,我打印出了tuple,但输出的只是窗口尺寸(如果我也理解正确的话)。在

但我不仅需要尺寸,比如宽度和高度,还需要知道从图像中提取窗口的位置,像素坐标,或者图像中的行/列。在


Tags: pyhttps图像githubmastercombox尺寸
2条回答

如果您尝试使用 flatten=False要在图像上创建窗口的“网格”:

import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt

img = lena()
print(img.shape)
# (512, 512)

# make a 64x64 pixel sliding window on img. 
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)

plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose

要获得相应的像素坐标,可以执行以下操作:

^{pr2}$

使用flatten=True,64x64像素窗口的8x8网格将被展平成64个64x64像素窗口的长矢量。那样的话你 可以使用np.unravel_index之类的方法从一维向量索引进行转换 在一个网格索引的元组中,然后使用这些来获得像素坐标 上图:

win = sliding_window(img, (64, 64), flatten=True)

grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))

print(np.all(img[t:b, l:r] == win[12]))
# True

好吧,我会尽力回答你在评论中提出的一些问题。在

I want the pixel location of the window relative to the actual pixel dimensions original image.

也许我还不够清楚-你已经可以使用类似于我的get_win_pixel_coords()函数来完成这项工作,它提供窗口相对于图像的上、下、左和右坐标。例如:

win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy')         # position of Lena's eye, relative to this window

t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))

ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image

plt.show()

还请注意,我已经更新了get_win_pixel_coords(),以处理shiftSize不是{}(即窗口不能完全平铺没有重叠的图像)。在

So I'm guessing that in that case, I should just make the grid be equal to the original image's dimensions, is that right? (instead of using 8x8).

不,如果窗口不重叠地平铺图像(即shiftSize=None,我目前为止一直假设),那么如果你让网格尺寸等于图像的像素尺寸,那么每个窗口只包含一个像素!在

So in my case, for an image of width: 360 and height: 240, would that mean I use this line: grid_pos = np.unravel_index(*12*, (240, 360)). Also, what does 12 refer to in this line?

正如我所说,使“网格大小”等于图像尺寸是没有意义的,因为每个窗口只包含一个像素(至少,假设窗口是不重叠的)。12表示将索引放入扁平的窗口网格中,例如:

x = np.arange(25).reshape(5, 5)    # 5x5 grid containing numbers from 0 ... 24
x_flat = x.ravel()                 # flatten it into a 25-long vector
print(x_flat[12])                  # the 12th element in the flattened vector
# 12
row, col = np.unravel_index(12, (5, 5))  # corresponding row/col index in x
print(x[row, col])
# 12

I am shifting 10 pixels with each window, and the first sliding window starts from coordinates 0x0 on the image, and the second starts from 10x10, etc, then I want it the program to return not just the window contents but the coordinates corresponding to each window, i.e. 0,0, and then 10,10, etc

如前所述,您已经可以使用get_win_pixel_coords()返回的上、下、左、右坐标来获得窗口相对于图像的位置。如果您真的需要,可以将其打包成一个函数:

def get_pixels_and_coords(win_grid, grid_pos):
    pix = win_grid[grid_pos]
    tblr = get_win_pixel_coords(grid_pos, pix.shape)
    return pix, tblr

# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))

如果需要窗口中每个像素相对于图像的坐标,另一个技巧是构造包含图像中每个像素的行和列索引的数组,然后将滑动窗口应用于这些:

ridx, cidx = np.indices(img.shape)
r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)

pix = win[3, 4]    # pixel values
r = r_win[3, 4]    # row index of every pixel in the window
c = c_win[3, 4]    # column index of every pixel in the window

因为scipy.misc.lena()在>;0.17中不再可用,所以要更新@ali\u m answer's。下面是一个使用RGB图像scipy.misc.face()的示例,对OP中提供的滑动窗口源代码稍作修改

import numpy as np
from scipy.misc import ascent, face
from matplotlib import pyplot as plt
from numpy.lib.stride_tricks import as_strided as ast

def get_win_pixel_coords(grid_pos, win_shape, shift_size=None):
    if shift_size is None:
        shift_size = win_shape
    gr, gc = grid_pos
    sr, sc = shift_size
    wr, wc = win_shape
    top, bottom = gr * sr, (gr * sr) + wr
    left, right = gc * sc, (gc * sc) + wc

    return top, bottom, left, right
def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple,
    even for one-dimensional shapes.
    Parameters
        shape - an int, or a tuple of ints
    Returns
        a shape tuple
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions
    '''
    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)
    # convert ws, ss, and a.shape to numpy arrays
    ws = np.array(ws)
    ss = np.array(ss)
    shap = np.array(a.shape)
    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shap),len(ws),len(ss)]
    if 1 != len(set(ls)):
        raise ValueError(\
        'a.shape, ws and ss must all have the same length. They were %s' % str(ls))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shap):
        raise ValueError(\
        'ws cannot be larger than a in any dimension.\
 a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
    # how many slices will there be in each dimension?
    newshape = norm_shape(((shap - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    a = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return a
    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    #dim = filter(lambda i : i != 1,dim)
    return a.reshape(dim), newshape

将返回变量newshape添加到sliding_window()可以传递flatten=True,并且仍然知道滑动窗口函数创建的网格的性质。在我的应用程序中,计算窗口的展平向量是可取的,因为这是一个很好的点来缩放应用于每个计算窗口的计算。在

如果一个96x96窗口(即tilextile)在两个方向上有50%的重叠应用于形状为(768,1024,3)的图像,则可以对输入图像进行填充,以确保在创建滑动窗口之前,输入图像可以被N个没有余数的窗口整除。在

^{pr2}$

计算窗口的网格包含15行21列和315个计算窗口。grid_pos可以使用计算窗口的平坦向量(即win)、ind[0]和{}的索引来确定。如果我们对第239个计算窗口感兴趣:

grid_pos = np.unravel_index(239,(ind[0],ind[1]))
print(grid_pos1)
#(11, 8)

然后,可以使用以下方法找到原始图像中计算窗口的边界坐标:

t, b, l, r = get_win_pixel_coords(grid_pos, (96, 96), (48,48))
print(np.all(pad_img[t:b, l:r] == win[239]))
#True

相关问题 更多 >

    热门问题