<h2>使用Cython的替代方案</h2>
<p>如果您能够使用Cython,那么您可以使用类似于以下内容的内容,可以使用以下内容从Jupyter笔记本上运行</p>
<pre><code>%load_ext Cython
</code></pre>
<p>然后在一个单独的单元中运行</p>
<pre><code>%%cython -a
from cython cimport boundscheck, wraparound
cimport numpy as np
import numpy as np
@boundscheck(False)
@wraparound(False)
def cython_sticky_cumsum(int[::1] a, int[::1] b):
cdef size_t i, N = a.size
cdef np.ndarray[np.int32_t, ndim=1] result = np.empty(N, dtype=np.int32)
for i in range(N):
if a[i] == 1:
result[i] = 1
elif b[i] == :
result[i] = 0
else:
result[i] = result[i-1]
return result
</code></pre>
<p>如果您关心性能/使用大型阵列,那么上面的内容可能会更好。我想这取决于你觉得什么更具可读性</p>
<pre><code>a = np.array([1, 0, 0, 0, 1, 0])
b = np.array([0, 0, 1, 0, 0, 0])
cython_sticky_cumsum(a, b)
# array([1, 1, 0, 0, 1, 1])
</code></pre>
<h2>粗试验</h2>
<p>对于更大的阵列,例如</p>
<pre><code>a = np.tile(np.array([1, 0, 0, 0, 1, 0]), 1000000)
b = np.tile(np.array([0, 0, 1, 0, 0, 0]), 1000000)
</code></pre>
<p>进行测试</p>
<pre><code>%timeit cython_sticky_cumsum(a,b)
%timeit sticky_cumsum(a, b)
</code></pre>
<p>输出</p>
<pre><code>28.4 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.5 s ± 97.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>