<p><strong>矢量化方法</strong></p>
<p>将<code>A</code>扩展到{<cd2>}之后的一种矢量化方法是-</p>
<pre><code>(1/(np.sqrt((A[:,None] - B)**2 + d**2))).sum(1)
</code></pre>
<p><strong>大型阵列的混合方法</strong></p>
<p>现在,对于大型数组,我们可能需要将数据分成块。在</p>
<p>因此,使用<code>BSZ</code>作为块大小,我们将使用一种混合方法,如-</p>
^{pr2}$
<hr/>
<h2>运行时测试</h2>
<p>方法-</p>
<pre><code>def original_app(A,B,d):
V = np.zeros(n)
for i in range(n):
xdist = A[i] - B
r = np.sqrt(xdist**2 + d**2)
dV = 1/r
V[i] = np.sum(dV)
return V
def vectorized_app1(A,B,d):
return (1/(np.sqrt((A[:,None] - B)**2 + d**2))).sum(1)
def vectorized_app2(A,B,d, BSZ = 100):
dsq = d**2
V = np.zeros((n//BSZ,BSZ))
for i in range(n//BSZ):
V[i] = (1/(np.sqrt((A[i*BSZ:(i+1)*BSZ,None] - B)**2 + dsq))).sum(1)
return V.ravel()
</code></pre>
<p>时间安排和验证-</p>
<pre><code>In [203]: # Setup inputs
...: n,m = 10000,2000
...: A = np.random.rand(n)
...: B = np.random.rand(m)
...: d = 10
...:
In [204]: out1 = original_app(A,B,d)
...: out2 = vectorized_app1(A,B,d)
...: out3 = vectorized_app2(A,B,d, BSZ = 100)
...:
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...:
True
True
In [205]: %timeit original_app(A,B,d)
10 loops, best of 3: 133 ms per loop
In [206]: %timeit vectorized_app1(A,B,d)
10 loops, best of 3: 138 ms per loop
In [207]: %timeit vectorized_app2(A,B,d, BSZ = 100)
10 loops, best of 3: 65.2 ms per loop
</code></pre>
<p>我们可以使用参数块大小<code>BSZ</code>-</p>
<pre><code>In [208]: %timeit vectorized_app2(A,B,d, BSZ = 200)
10 loops, best of 3: 74.5 ms per loop
In [209]: %timeit vectorized_app2(A,B,d, BSZ = 50)
10 loops, best of 3: 67.4 ms per loop
</code></pre>
<p>因此,最好的方法似乎是在我的末尾提供一个块大小为<code>2x</code></strong>的加速。在</p>