<blockquote>
<p>According to <a href="https://murillogroupmsu.com/julia-set-speed-comparison/" rel="nofollow noreferrer">https://murillogroupmsu.com/julia-set-speed-comparison/</a> numba used on pure python code is faster than used on python code that uses numpy. Is that generally true and why?</p>
<p>In <a href="https://stackoverflow.com/a/25952400/4533188">https://stackoverflow.com/a/25952400/4533188</a> it is explained why numba on pure python is faster than numpy-python: numba sees more code and has more ways to optimize the code than numpy which only sees a small portion.</p>
</blockquote>
<p>Numba只是用它自己的实现替换numpy函数。它们可以更快/更慢,结果也可能不同。问题在于这种替换是如何发生的。经常会涉及不必要的临时数组和循环,它们可以被融合在一起。在</p>
<p>循环融合和移除临时阵列不是一件容易的事。如果为并行目标(循环融合效果更好)或单线程目标编译,则行为也会有所不同。在</p>
<p><strong>[编辑]</strong>
在并行加速器中完成的优化<a href="https://numba.pydata.org/numba-doc/dev/user/parallel.html" rel="nofollow noreferrer">Section 1.10.4. Diagnostics</a>(如循环融合)也可以通过设置<code>parallel=True</code>和<code>nb.parfor.sequential_parfor_lowering = True</code>来启用。<a href="https://github.com/numba/numba/issues/3092" rel="nofollow noreferrer">1</a></p>
<p><strong>示例</strong></p>
<pre><code>#only for single-threaded numpy test
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numba as nb
import numpy as np
a=np.random.rand(100_000_000)
b=np.random.rand(100_000_000)
c=np.random.rand(100_000_000)
d=np.random.rand(100_000_000)
#Numpy version
#every expression is evaluated on its own
#the summation algorithm (Pairwise summation) isn't equivalent to the algorithm I used below
def Test_np(a,b,c,d):
return np.sum(a+b*2.+c*3.+d*4.)
#The same code, but for Numba (results and performance differ)
@nb.njit(fastmath=False,parallel=True)
def Test_np_nb(a,b,c,d):
return np.sum(a+b*2.+c*3.+d*4.)
#the summation isn't fused, aprox. the behaiviour of Test_np_nb for
#single threaded target
@nb.njit(fastmath=False,parallel=True)
def Test_np_nb_eq(a,b,c,d):
TMP=np.empty(a.shape[0])
for i in nb.prange(a.shape[0]):
TMP[i]=a[i]+b[i]*2.+c[i]*3.+d[i]*4.
res=0.
for i in nb.prange(a.shape[0]):
res+=TMP[i]
return res
#The usual way someone would implement this in Numba
@nb.njit(fastmath=False,parallel=True)
def Test_nb(a,b,c,d):
res=0.
for i in nb.prange(a.shape[0]):
res+=a[i]+b[i]*2.+c[i]*3.+d[i]*4.
return res
</code></pre>
<p><strong>计时</strong></p>
^{pr2}$
<p><strong>结果</strong></p>
<pre><code>#single-threaded
res_1=Test_nb(a,b,c,d)
499977967.27572954
res_2=Test_np(a,b,c,d)
499977967.2756622
res_3=Test_np_nb(a,b,c,d)
499977967.2756614
res_4=Test_np_nb_eq(a,b,c,d)
499977967.2756614
#multi-threaded
res_1=Test_nb(a,b,c,d)
499977967.27572465
res_2=Test_np(a,b,c,d)
499977967.2756622
res_3=Test_np_nb(a,b,c,d)
499977967.27572465
res_4=Test_np_nb_eq(a,b,c,d)
499977967.27572465
</code></pre>
<p><strong>结论</strong></p>
<p>这取决于用例什么是最好的使用。有些算法可以很容易地用Numpy写成几行,而另一些算法则很难或不可能以矢量化的方式实现。在</p>
<p>我还特意用了一个求和的例子。一次完成这一切很容易,而且速度也快得多,但是如果我想要最精确的结果,我肯定会使用一个已经在Numpy中实现的更复杂的算法。当然,你也可以在Numba做同样的事情,但那将是更多的工作要做。在</p>