<p>循环无法避免,但可以使用<code>numba</code>的<code>njit</code>并行化:</p>
<pre><code>from numba import njit, prange
@njit
def dynamic_cumsum(seq, index, max_value):
cumsum = []
running = 0
for i in prange(len(seq)):
if running > max_value:
cumsum.append([index[i], running])
running = 0
running += seq[i]
cumsum.append([index[-1], running])
return cumsum
</code></pre>
<p>这里需要索引,假设您的索引不是数字/单调递增的。在</p>
^{pr2}$
<hr/>
<p>如果索引是<code>Int64Index</code>类型,可以将其缩短为:</p>
<pre><code>@njit
def dynamic_cumsum2(seq, max_value):
cumsum = []
running = 0
for i in prange(len(seq)):
if running > max_value:
cumsum.append([i, running])
running = 0
running += seq[i]
cumsum.append([i, running])
return cumsum
lst = dynamic_cumsum2(df.iloc(axis=1)[0].values, 5)
pd.DataFrame(lst, columns=['A', 'B']).set_index('A')
B
A
3 10
7 8
9 4
</code></pre>
<p/>
<pre><code>%timeit foo(df, 5)
1.23 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit dynamic_cumsum2(df.iloc(axis=1)[0].values, 5)
71.4 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
</code></pre>
<hr/>
<p><strong><code>njit</code>函数性能</strong></p>
<pre><code>perfplot.show(
setup=lambda n: pd.DataFrame(np.random.randint(0, 10, size=(n, 1))),
kernels=[
lambda df: list(cumsum_limit_nb(df.iloc[:, 0].values, 5)),
lambda df: dynamic_cumsum2(df.iloc[:, 0].values, 5)
],
labels=['cumsum_limit_nb', 'dynamic_cumsum2'],
n_range=[2**k for k in range(0, 17)],
xlabel='N',
logx=True,
logy=True,
equality_check=None # TODO - update when @jpp adds in the final `yield`
)
</code></pre>
<p>log-log图显示,对于较大的输入,generator函数更快:</p>
<p><a href="https://i.stack.imgur.com/1rHET.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/1rHET.png" alt="enter image description here"/></a></p>
<p>一种可能的解释是,随着N的增加,在<code>dynamic_cumsum2</code>中向一个不断增长的列表追加内容的开销变得突出。而<code>cumsum_limit_nb</code>只需要<code>yield</code>。在</p>