<p>如果您尝试使用
<code>flatten=False</code>要在图像上创建窗口的“网格”:</p>
<pre><code>import numpy as np
from scipy.misc import lena
from matplotlib import pyplot as plt
img = lena()
print(img.shape)
# (512, 512)
# make a 64x64 pixel sliding window on img.
win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)
print(win.shape)
# (8, 8, 64, 64)
# i.e. (img_height / win_height, img_width / win_width, win_height, win_width)
plt.imshow(win[4, 4, ...])
plt.draw()
# grid position [4, 4] contains Lena's eye and nose
</code></pre>
<p>要获得相应的像素坐标,可以执行以下操作:</p>
^{pr2}$
<p>使用<code>flatten=True</code>,64x64像素窗口的8x8网格将被展平成64个64x64像素窗口的长矢量。那样的话你
可以使用<code>np.unravel_index</code>之类的方法从一维向量索引进行转换
在一个网格索引的元组中,然后使用这些来获得像素坐标
上图:</p>
<pre><code>win = sliding_window(img, (64, 64), flatten=True)
grid_pos = np.unravel_index(12, (8, 8))
t, b, l, r = get_win_pixel_coords(grid_pos, (64, 64))
print(np.all(img[t:b, l:r] == win[12]))
# True
</code></pre>
<hr/>
<p>好吧,我会尽力回答你在评论中提出的一些问题。在</p>
<blockquote>
<p>I want the pixel location of the window relative to the actual pixel dimensions original image.</p>
</blockquote>
<p>也许我还不够清楚-你已经可以使用类似于我的<code>get_win_pixel_coords()</code>函数来完成这项工作,它提供窗口相对于图像的上、下、左和右坐标。例如:</p>
<pre><code>win = sliding_window(img, (64, 64), shiftSize=None, flatten=False)
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.hold(True)
ax1.imshow(win[4, 4])
ax1.plot(8, 9, 'oy') # position of Lena's eye, relative to this window
t, b, l, r = get_win_pixel_coords((4, 4), (64, 64))
ax2.hold(True)
ax2.imshow(img)
ax2.plot(t + 8, l + 9, 'oy') # position of Lena's eye, relative to whole image
plt.show()
</code></pre>
<p>还请注意,我已经更新了<code>get_win_pixel_coords()</code>,以处理<code>shiftSize</code>不是{<cd7>}(即窗口不能完全平铺没有重叠的图像)。在</p>
<blockquote>
<p>So I'm guessing that in that case, I should just make the grid be equal to the original image's dimensions, is that right? (instead of using 8x8).</p>
</blockquote>
<p>不,如果窗口不重叠地平铺图像(即<code>shiftSize=None</code>,我目前为止一直假设),那么如果你让网格尺寸等于图像的像素尺寸,那么每个窗口只包含一个像素!在</p>
<blockquote>
<p>So in my case, for an image of width: 360 and height: 240, would that mean I use this line: <code>grid_pos = np.unravel_index(*12*, (240, 360))</code>. Also, what does 12 refer to in this line?</p>
</blockquote>
<p>正如我所说,使“网格大小”等于图像尺寸是没有意义的,因为每个窗口只包含一个像素(至少,假设窗口是不重叠的)。12表示将索引放入扁平的窗口网格中,例如:</p>
<pre><code>x = np.arange(25).reshape(5, 5) # 5x5 grid containing numbers from 0 ... 24
x_flat = x.ravel() # flatten it into a 25-long vector
print(x_flat[12]) # the 12th element in the flattened vector
# 12
row, col = np.unravel_index(12, (5, 5)) # corresponding row/col index in x
print(x[row, col])
# 12
</code></pre>
<blockquote>
<p>I am shifting 10 pixels with each window, and the first sliding window starts from coordinates 0x0 on the image, and the second starts from 10x10, etc, then I want it the program to return not just the window contents but the coordinates corresponding to each window, i.e. 0,0, and then 10,10, etc</p>
</blockquote>
<p>如前所述,您已经可以使用<code>get_win_pixel_coords()</code>返回的上、下、左、右坐标来获得窗口相对于图像的位置。如果您真的需要,可以将其打包成一个函数:</p>
<pre><code>def get_pixels_and_coords(win_grid, grid_pos):
pix = win_grid[grid_pos]
tblr = get_win_pixel_coords(grid_pos, pix.shape)
return pix, tblr
# e.g.:
pix, tblr = get_pixels_and_coords(win, (3, 4))
</code></pre>
<p>如果需要窗口中每个像素相对于图像的坐标,另一个技巧是构造包含图像中每个像素的行和列索引的数组,然后将滑动窗口应用于这些:</p>
<pre><code>ridx, cidx = np.indices(img.shape)
r_win = sliding_window(ridx, (64, 64), shiftSize=None, flatten=False)
c_win = sliding_window(cidx, (64, 64), shiftSize=None, flatten=False)
pix = win[3, 4] # pixel values
r = r_win[3, 4] # row index of every pixel in the window
c = c_win[3, 4] # column index of every pixel in the window
</code></pre>