如何加速这个嵌套循环（按日期索引？）

email event_date event_type 0 4867784685125632 2015-10-26 21:38:03.911350 delivered 1 5352432066363392 2015-10-26 21:37:57.871980 delivered 2 6024938649550848 2015-10-26 21:37:57.853210 purchase 3 6191500064980992 2015-10-26 21:37:58.867800 delivered 4 4867784685125632 2015-10-28 21:37:56.331130 purchase

attributed_purchases = [] count = 0 for idx_e, row_e in delivered.iterrows(): purch = 0 rev = 0 for idx_p, row_p, in purchased.iterrows(): if delivered.loc[idx_e, 'email'] != purchased.loc[idx_p, 'email']: pass elif (purchased.loc[idx_p, 'event_date'] >= delivered.loc[idx_e, 'event_date']) and purchased.loc[idx_p, 'event_date'] <= (delivered.loc[idx_e, 'event_date'] + timedelta(days=5)): purch += 1 print('I just found a purchase') attributed_purchases.append(purch) count += 1 print(f'Completed iteration {count}') delivered['attributed_purchases'] = attributed_purchases

1条回答

网友

1楼 · 发布于 2024-09-26 18:05:57

在不知道更具体要求的情况下很辛苦，但一些高层建议——

在内部循环中找到购买数据后使用break，这样就不会不必要地处理所有剩余的项目。你知道吗
在交付被填充时清理购买列表，这样它就会随着时间的推移而缩小，并且外部循环的未来迭代不会处理已经被属性化的项目。你知道吗

不过，根据我的评论，我仍然认为简单的单循环方法会更有效率，要处理一封电子邮件中多个可能重叠的购买，您只需将它们存储为一个列表（defaultdict(list)用于方便），并在运行时管理这些列表。这也确保了一次交付不会完成多次购买，尽管如果需要的话，只需将整个try块更改为count += bool(pending[ehash])

import datetime
from collections import defaultdict

emails = ((e[0], datetime.datetime.strptime(e[1], '%Y-%m-%d'), e[2])
    for e in (
      (1, '2019-01-01', 'delivered'),  # ignored, no prior purchase
      (1, '2019-01-02', 'purchase'),
      (1, '2019-01-03', 'purchase'),
      (1, '2019-01-04', 'delivered'),  # matches [1], count == 1
      (1, '2019-01-05', 'purchase'),
      (1, '2019-01-06', 'delivered'),  # matches [2], count == 2
      (1, '2019-01-20', 'delivered')   # ignored, too long since last purchase
    )
)
count = 0
pending = defaultdict(list)

for (ehash, date, status) in sorted(emails, key=lambda e: e[1]):

    # record a purchase awaiting delivery
    if status == 'purchase':
        pending[ehash].append(date)

    elif status == 'delivered':
        # purge any purchases for this email > 5days old
        pending[ehash] = [p_date for p_date in pending[ehash]
                         if p_date > date - datetime.timedelta(days=5)]

        # then the next oldest (<5days) also deleted, and increments the count
        try:
            del pending[ehash][0]
            count += 1
        except IndexError:
            pass # No valid purchase for this delivery

print(count)

相关问题更多 >

编程相关推荐

热门问题

热门文章