在哪里可以改进代码以缩短其执行时间?

2024-10-02 14:29:54 发布

您现在位置:Python中文网/ 问答频道 /正文

请求from HackerRank

If the amount spent by a client on a particular day is greater than or equal to 2× the client's median spending for a trailing number of days, they send the client a notification about potential fraud. The bank doesn't send the client any notifications until they have at least that trailing number of prior days' transaction data.

Given the number of trailing days d and a client's total daily expenditures for a period of n days, determine the number of times the client will receive a notification over all n days.

我的代码可以解决这个问题,但是对于大型测试用例来说有时间限制。我的代码不能通过时间限制要求。我的代码实际上很短:

from statistics import median

first_multiple_input = input().rstrip().split()
n = int(first_multiple_input[0])
d = int(first_multiple_input[1])
expenditure = list(map(int, input().rstrip().split()))
count=0
for i in range(len(expenditure)-d):
    if expenditure[d+i] >= 2*median(expenditure[i:d+i]) :
        count+=1
print( count)

请指出造成延误的原因以及如何改进

帮助理解代码的小测试用例:

9 5                 expenditure[] size n =9, d = 5
2 3 4 2 3 6 8 4 5   expenditure = [2, 3, 4, 2, 3, 6, 8, 4, 5]

Tags: ofthe代码clientnumberforinputcount
1条回答
网友
1楼 · 发布于 2024-10-02 14:29:54
分析/想法

你的median(expenditure[i:d+i])是罪魁祸首,因为每次sorting大小为d的整个未排序切片都需要O(d log d)时间。您可以通过将尾部元素的当前窗口保持在SortedList中,将其减少为O(logd)。从中间的一两个元素中获得中间值,要更新,只需添加一个新元素并删除最旧的元素

实施

from sortedcontainers import SortedList

n = 9
d = 5
expenditure = [2, 3, 4, 2, 3, 6, 8, 4, 5]

count = 0
trailing = SortedList(expenditure[:d])
half = d // 2
for i in range(d, n):
    median = (trailing[half] + trailing[~half]) / 2
    if expenditure[i] >= 2 * median:
        count += 1
    trailing.add(expenditure[i])
    trailing.remove(expenditure[i - d])
print(count)

我们可以省略/ 22 *,但是“median”将是错误的名称,并且naming things is hard。我们可以做if expenditure[i] >= trailing[half] + trailing[~half],但我觉得不太清楚

输出

如果你加上

    print(f'{trailing=} {median=} {expenditure[i]=}')

median = ...行之后,您可以看到发生了什么:

trailing=SortedList([2, 2, 3, 3, 4]) median=3.0 expenditure[i]=6
trailing=SortedList([2, 3, 3, 4, 6]) median=3.0 expenditure[i]=8
trailing=SortedList([2, 3, 4, 6, 8]) median=4.0 expenditure[i]=4
trailing=SortedList([2, 3, 4, 6, 8]) median=4.0 expenditure[i]=5
2

替代实施方案

使用zip代替索引:

count = 0
trailing = SortedList(expenditure[:d])
half = d // 2
for today, oldest in zip(expenditure[d:], expenditure):
    median = (trailing[half] + trailing[~half]) / 2
    if today >= 2 * median:
        count += 1
    trailing.add(today)
    trailing.remove(oldest)
print(count)

替代数据结构:已排序的常规列表

我发现了问题at HackerRank,它没有sortedcontainers。但是下面的内容在那里被接受了

我们可以使用常规Python list,但在Python标准库中包含的sortedbisect的帮助下,我们自己对其进行排序:

from bisect import bisect_left, insort

count = 0
trailing = sorted(expenditure[:d])
half = d // 2
for today, oldest in zip(expenditure[d:], expenditure):
    median = (trailing[half] + trailing[~half]) / 2
    if today >= 2 * median:
        count += 1
    insort(trailing, today)
    del trailing[bisect_left(trailing, oldest)]
print(count)

访问中间元素需要O(1)个时间,查找插入/删除索引需要O(logd)个时间,实际插入/删除需要O(d)个时间(因为它需要移动索引右侧的所有元素)。但是O(d)移动非常快,非常低

还有两个:按数组排序和计数排序

这个问题最初没有提到HackerRank。现在我看到值被限制为0到200之间的整数,我们还可以使用bytearray

trailing = bytearray(sorted(expenditure[:d]))

正如我刚才在讨论中指出的,对于这个允许值的范围,我们也可以使用计数排序的形式。我想一个Fenwick tree会让这个速度特别快,我可能稍后再试试

基准

在你提到的评论中,n=200000和d=10122是一个大案例。所以我用这些数据进行了测试:

n = 200000
d = 10122
expenditure = random.choices(range(201), k=n)

我的解决方案的基准:

                       at replit.com   on my weak laptop
SortedList + indexing   ~1.8 seconds    ~6.4 seconds
SortedList + zipping    ~1.8 seconds    ~6.4 seconds
sorted regular list     ~0.6 seconds    ~8.8 seconds
sorted bytearray        ~0.3 seconds    ~1.7 seconds

我不知道为什么常规列表解决方案在我的笔记本电脑上相对较慢。我怀疑它超过了CPU的1级缓存

相关问题 更多 >