数组中基于条件的最快求和方法

2024-09-29 23:24:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数组:

testdata = [-2, -1, 0, 1, 2, 3, 10, 3, 2, 1, 0, -1, -2]

所以它有一个最大值,在左边和右边,值逐渐下降到零,然后它的值可以低于0

我的代码的目的是找到数组的最大值,并将所有这些值向左和向右相加,直到值为0(包括最大值)

为了测试我的代码,我生成了这样一个较大的数组(忽略可能小于0的值):

data1 = [x for x in range(0, 100000, 1)]
data2 = [x for x in range(100000, -1, -1)]

data3 = data1 + data2

我能想出的最快代码如下所示:

j = 1
k = 0

max_index = np.where(data3 == np.amax(data3))[0][0]

while data3[max_index + j] > 0:
    j += 1

while data3[max_index - k] > 0:
    k += 1

summ1 = np.sum(data3[max_index:(max_index+j)]) 
summ2 = np.sum(data3[(max_index-k):max_index])
total = summ1 + summ2

print(total)

关于如何更快地提高这一点,有什么建议吗


Tags: 代码inforindexnprange数组max
2条回答

您可以使用掩蔽而不是使用循环

遮罩[data3[max_index:] > 0][data3[:max_index] > 0]等同于滑动[max_index:(max_index+j)][(max_index-k):max_index],除非您不必费心自己寻找jk

from contextlib import contextmanager
import numpy as np
import time

@contextmanager
def time_this_scope(name):
    """Handy context manager to time a portion of code."""
    t0 = time.perf_counter()
    yield
    print(f"{name} took {time.perf_counter() - t0}s.")

# Preparing the data.
data1 = [x for x in range(0, 100000, 1)]
data2 = [x for x in range(100000, -1, -1)]
data3 = data1 + data2
max_index = np.where(data3 == np.amax(data3))[0][0]
    
# Comparing the performance of both methods.
with time_this_scope("method 1"):
    j = 1
    k = 0
    while data3[max_index + j] > 0:
        j += 1    
    while data3[max_index - k] > 0:
        k += 1
    summ1 = np.sum(data3[max_index:(max_index+j)]) 
    summ2 = np.sum(data3[(max_index-k):max_index])
    total_m1 = summ1 + summ2

with time_this_scope("method 2"):    
    data3 = np.array(data3)
    summ1 = np.sum(data3[max_index:][data3[max_index:] > 0])
    summ2 = np.sum(data3[:max_index][data3[:max_index] > 0])      
    total_m2 = summ1 + summ2
    
# Checking they do yield the same result.
print(total_m1 == total_m2)

>>> method 1 took 0.08157979999998588s.
>>> method 2 took 0.011274500000013177s.
>>> True

这在很大程度上取决于数据。似乎您正在试图找到一种有效的方法来返回数组中某个内容的第一个索引。嗯,there isn't an efficient one in ^{}因为在numpy中只允许整个数组的迭代,而you can use ^{}是为了优于numpy

如果您需要对列表中的一大部分或一小部分求和,numpy是一个不错的选择:

zero_idx = np.where(data3==0)[0]
max_loc = np.searchsorted(zero_idx, np.argmax(data3))
start, end = zero_idx[max_loc - 1], zero_idx[max_loc]
total_sum = np.sum(data3[start:end])

否则,使用pythonicindex方法(或numba):

k = np.argmax(data3)
left_list = data3[k:].tolist()
right_list = data3[k::-1].tolist()
s1 = np.sum(data3[k: k + left_list.index(0)])
s2 = np.sum(data3[k - right_list.index(0): k])
total_sum = s1 + s2

基准。 我得到的第一种方法是在Jupyter笔记本中使用%timeitdecorator,速度快20倍:

512 µs ± 34.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
10.2 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

相关问题 更多 >

    热门问题