<p>在寻找矢量化和更快的解决方法的过程中,我学到了一些东西。在</p>
<p>1)首先,在<code>"for j in range(i)"</code>处存在迭代器的依赖关系。根据我以前的经验,尤其是在试图解决<code>MATLAB</code>上的问题时,似乎可以用<a href="http://mathworld.wolfram.com/LowerTriangularMatrix.html" rel="nofollow">^{<cd3>}</a>来处理这种依赖关系,因此{a2}应该可以在那里工作。因此,一个完全矢量化的解决方案和内存效率不高的解决方案(因为它在最终还原为<code>(N,)</code>形状的数组之前创建了一个中间的<code>(N,N)</code>形数组)将是-</p>
<pre><code>def fully_vectorized(a,b):
return np.tril(np.einsum('ijk,jil->kl',a,b),-1).sum(1)
</code></pre>
<p>2)下一个技巧/想法是在<code>for i in range(N)</code>中为迭代器<code>i</code>保留一个循环,但使用索引插入该依赖关系,并使用<a href="http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.einsum.html" rel="nofollow">^{<cd9>}</a>执行所有这些乘法和求和。这样做的好处是内存效率。执行应该是这样的-</p>
^{pr2}$
<p>有两种更明显的解决方法。如果我们可以用两个外循环来执行两个循环</p>
<pre><code>def tensordot_twoloop(a,b):
d = np.zeros(N)
for i in range(N):
for j in range(i):
d[i] += np.tensordot(a[:,:,i],b[:,:,j], axes=([1,0],[0,1]))
return d
def einsum_twoloop(a,b):
d = np.zeros(N)
for i in range(N):
for j in range(i):
d[i] += np.einsum('ij,ji->',a[:,:,i],b[:,:,j])
return d
</code></pre>
<p><strong>运行时测试</strong></p>
<p>让我们比较一下迄今为止发布的五种解决问题的方法,包括在问题中发布的方法。在</p>
<p>案例1:</p>
<pre><code>In [26]: # Input arrays with random elements
...: m,n,N = 20,20,20
...: a = np.random.rand(m,n,N)
...: b = np.random.rand(n,m,N)
...:
In [27]: %timeit all_loopy(a,b)
...: %timeit tensordot_twoloop(a,b)
...: %timeit einsum_twoloop(a,b)
...: %timeit einsum_oneloop(a,b)
...: %timeit fully_vectorized(a,b)
...:
10 loops, best of 3: 79.6 ms per loop
100 loops, best of 3: 4.97 ms per loop
1000 loops, best of 3: 1.66 ms per loop
1000 loops, best of 3: 585 µs per loop
1000 loops, best of 3: 684 µs per loop
</code></pre>
<p>案例2:</p>
<pre><code>In [28]: # Input arrays with random elements
...: m,n,N = 50,50,50
...: a = np.random.rand(m,n,N)
...: b = np.random.rand(n,m,N)
...:
In [29]: %timeit all_loopy(a,b)
...: %timeit tensordot_twoloop(a,b)
...: %timeit einsum_twoloop(a,b)
...: %timeit einsum_oneloop(a,b)
...: %timeit fully_vectorized(a,b)
...:
1 loops, best of 3: 3.1 s per loop
10 loops, best of 3: 54.1 ms per loop
10 loops, best of 3: 26.2 ms per loop
10 loops, best of 3: 27 ms per loop
10 loops, best of 3: 23.3 ms per loop
</code></pre>
<p>案例3(由于速度太慢而忽略所有的漏洞):</p>
<pre><code>In [30]: # Input arrays with random elements
...: m,n,N = 100,100,100
...: a = np.random.rand(m,n,N)
...: b = np.random.rand(n,m,N)
...:
In [31]: %timeit tensordot_twoloop(a,b)
...: %timeit einsum_twoloop(a,b)
...: %timeit einsum_oneloop(a,b)
...: %timeit fully_vectorized(a,b)
...:
1 loops, best of 3: 1.08 s per loop
1 loops, best of 3: 744 ms per loop
1 loops, best of 3: 568 ms per loop
1 loops, best of 3: 866 ms per loop
</code></pre>
<p>从数字上看,<code>einsum_oneloop</code>在我看来相当不错,而{<cd14>}可以在处理小到中等大小的数组时使用!在</p>