将列表中的每个元素与另一个列表中的两个元素进行比较,从而提高代码的效率

2024-09-29 02:29:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两份清单:

threshold=[0.123,0.108,0.102,0.087]
retention=[0.19,1,0.57,5,0.09]

我想知道每个retention元素是否在threshold列表中

我的代码在此说明:

ca2=[(b>retention[0]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca3=[(b>retention[1]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca4=[(b>retention[2]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca5=[(b>retention[3]>a) for b,a in zip(threshold[::1],threshold[1::1])]
ca6=[(b>retention[4]>a) for b,a in zip(threshold[::1],threshold[1::1])]

如您所见,它请求retention[0]是否在threshold中的哪个元素之间

我需要比较retention中的每个元素。我的代码可以工作,但它是多余的,我认为没有效率。我还想自动与threshold中的另外两个元素进行比较。如果您能指导我或帮助我提高代码的效率,我将不胜感激,因为保留列表可能会更长


Tags: 代码in元素列表forthresholdzip效率
3条回答

不完全确定您想要实现什么,但是您可以使用bisect在阈值列表中进行二进制搜索,以找到刚好低于给定数字的阈值

retention = [0.19, 1, 0.57, 5, 0.09]
threshold = [0.123, 0.108, 0.102, 0.087]
threshold = [0] + sorted(threshold) # add 0 and sort
bins = {t: [] for t in threshold}
for r in retention:
    k = bisect.bisect(threshold, r) # actually, this is the next threshold
    bins[threshold[k-1]].append(r)  # thus k-1 here to get the lower one
# {0: [], 0.087: [0.09], 0.102: [], 0.108: [], 0.123: [0.19, 1, 0.57, 5]}

与另一个bisect答案(这会产生非常不同的输出)一样,对于retention中的k个元素,每个查询的复杂性是O(logn),n是阈值的数目,总共O(klogn)

您可以生成一个字典,其中保留值作为键,阈值比较列表作为值。此外,如果将zip对象强制转换为列表,则不需要每次迭代都创建它

t = list(zip(threshold, threshold[1:]))
print({i: [(b > i > a) for b, a in t] for i in retention})

要检查每个保留元素是否在阈值的两个元素之间,可以使用对分(即每次检查的日志(n)时间)

代码

from bisect import bisect_left

def binary_search(a, x): 
    """Index of where x would be inserted into a
       return None if x < min(a) or x > max(a)"""
    i = bisect_left(a, x)
    return i if i != len(a) and i > 0 else None

threshold = [0.123,0.108,0.102,0.087]
threshold_asc = threshold[::-1]
retention = [0.123, 0.19,1,0.57,5,0.09, 0.087]

for r in retention:
  print(f'{r} ->> {binary_search(threshold_asc, r)}')

输出

0.123 ->> 3
0.19 ->> None
1 ->> None
0.57 ->> None
5 ->> None
0.09 ->> 1
0.087 ->> None

复杂性

O(log(N)) for each check of retention. This is more efficient than walking the list of thresholds to find pairs of surrounding values which would be O(N).

相关问题 更多 >