<p>使此向量化的一个技巧是为每对序列生成一个<code>comb[i] = buffer1[i]+buffer2[i-1]*voc_size</code>数组。然后,每个组合在数组中获得一个唯一的值。你可以通过做<code>v1[i] = comb[i] % voc_size, v2[i] = comb[i]//voc_size</code>找到这个组合。只要序列的数量不是很高(我想是10000),就没有必要再做向量化了。在</p>
<pre><code>def support_vectorized(data, num_series, resolutions, buffer_size, vocab_size):
ratios = np.zeros((num_series, vocab_size, num_series, vocab_size, resolutions))
prev = np.roll(data, 1, axis=2) # Get previous values
prev *= vocab_size # To separate prev from data
for i, series in enumerate(data):
for j, prev_series in enumerate(prev):
comb = series + prev_series
for k, buffer in enumerate(comb):
idx, counts = np.unique(buffer, return_counts=True)
v = idx % vocab_size
v2 = idx // vocab_size
ratios[i, v, j, v2, k] = counts/buffer_size
return ratios
</code></pre>
<p>但是,如果S或R较大,则可以进行完全矢量化,但这会占用大量内存:</p>
^{pr2}$
<p>但是,对于<code>S=100</code>,这比previos解决方案慢。一个折中的方法是在序列上保持一个for循环,以减少内存使用:</p>
<pre><code>def row_unique2(comb):
comb.sort(axis=-1)
changes = np.concatenate((
np.ones((comb.shape[0], comb.shape[1], 1), dtype="bool"),
comb[:, :, 1:] != comb[:, :, :-1]), axis=-1)
vals = comb[changes]
idxs = np.nonzero(changes)
tmp = np.hstack((idxs[-1], 0))
counts = np.where(tmp[1:], np.diff(tmp), comb.shape[-1]-tmp[:-1])
return idxs, vals, counts
def supports_half_vectorized(data, num_series, resolutions, buffer_size, vocab_size):
prev = np.roll(data, 1, axis=2)*vocab_size
ratios = np.zeros((num_series, vocab_size, num_series, vocab_size, resolutions))
for i, series in enumerate(data):
comb = series + prev
idxs, vals, counts = row_unique2(comb)
ratios[i, vals % vocab_size, idxs[0], vals // vocab_size, idxs[1]] = counts/buffer_size
return ratios
</code></pre>
<p>不同解决方案的运行时间表明,<code>support_half_vectorized</code>是最快的</p>
<pre><code>In [41]: S, R, B, voc_size = (100, 5, 1000, 29)
In [42]: data = np.random.randint(voc_size, size=S*R*B).reshape((S, R, B))
In [43]: %timeit support_vectorized(data, S, R, B, voc_size)
1 loop, best of 3: 4.84 s per loop
In [44]: %timeit supports_full_vectorized(data, S, R, B, voc_size)
1 loop, best of 3: 5.3 s per loop
In [45]: %timeit supports_half_vectorized(data, S, R, B, voc_size)
1 loop, best of 3: 4.36 s per loop
In [46]: %timeit supports_4_loop(data, S, R, B, voc_size)
1 loop, best of 3: 36.7 s per loop
</code></pre>