如何最好地通过一个大的时间戳事件列表来解析并发条目？

EVENTS[tag][end][start] = [list of occurrences at that start/end time stamp] endkey = EVENTS[tag].keys() endkey.sort() peak = 0 for end in endkey: endentrykey = EVENTS[tag].keys() endentrykey.sort() for endtime in endentrykey: if endtime < end: # We can disregard entries that ended before the event break startentrykey = EVENTS[tag][item].keys() startentrykey.sort() for starttime in startentrykey: if starttime > end: # ignore events that started after the event ended break peak = len(EVENT[tag][endtime][starttime])

1条回答

网友

1楼 · 发布于 2024-09-23 16:28:17

您可以在python评测中查看这个问题的答案question，以帮助您识别瓶颈。有一点排序正在进行，这可能是次优的

如果您的事件数据是按开始时间排序的，那么您应该能够按顺序遍历事件，并向前和向后查看以确定重叠事件的计数

这里有一个例子，我的意思是，我生成150000个随机持续时间的连续开始事件。然后以上面描述的方式遍历事件，并打印最具并发性的前十个事件

>python py_counter.py
(7548: start 1546268748.0, duration: 89.0, 47)
(7547: start 1546268747.0, duration: 8.0, 46)
(7546: start 1546268746.0, duration: 86.0, 45)
(7545: start 1546268745.0, duration: 36.0, 44)
(7544: start 1546268744.0, duration: 21.0, 43)
(59089: start 1546320289.0, duration: 25.0, 43)
(7543: start 1546268743.0, duration: 75.0, 42)
(59088: start 1546320288.0, duration: 96.0, 42)
(104116: start 1546365316.0, duration: 70.0, 42)
(7542: start 1546268742.0, duration: 94.0, 41)
Finished 150,000 events in 5.1 seconds

代码如下：

from datetime import datetime, timedelta
from dataclasses import dataclass
from collections import Counter
from random import randint
import time


@dataclass(eq=True, frozen=True)  # make hashable
class Event:
    """Track event durations"""
    tag: int
    start: datetime
    end: datetime

    def __repr__(self):
        return f"{self.tag}: start {self.start.timestamp()}, duration: {self.end.timestamp() - self.start.timestamp()}"

start_time = time.time()
# create some sequentially starting events with random durations
events = []
for i in range(150000):
    start = datetime(2019, 1, 1) + timedelta(seconds=i)
    end = start + timedelta(seconds=randint(1, 100))
    events.append(Event(i, start, end))

concurrencies = Counter()
#events is a list sorted by start-time
for i, event in enumerate(events):
    # look back
    j = i - 1
    while j > 0 and event.start < events[j].end:
        j -= 1
        concurrencies[event] += 1
    # look forward
    j = i + 1
    max_index = len(events) - 1
    while j < max_index and event.end < events[i].start:
        j += 1
        concurrencies[event] += 1
# print events with highest number of concurrencies
for event in concurrencies.most_common(10):
    print(event)
print(f"Finished {len(events):,d} events in {time.time() - start_time:.1f} seconds")

如果有需要澄清的地方，请告诉我

相关问题更多 >

编程相关推荐

热门问题

热门文章