有效地收集大Pandas的重新采样的日期时间集合

1条回答

网友

1楼 · 发布于 2024-06-25 05:53:35

也许我们可以优化您的解决方案，只在单个列（“金额”，感兴趣的列）上完成重采样。在

(df.groupby(["Name", "IncomeOutcome"])['Amount']
   .resample("M")
   .agg(['sum','size'])
   .rename({'sum':'Amount', 'size': 'MonthlyCount'}, axis=1)
   .reset_index(level=-1, drop=True)
   .reset_index())

        Name IncomeOutcome  Amount  MonthlyCount
0  Customer1        Income   400.0             2
1  Customer2        Income   100.0             1
2  Customer2       Outcome  -200.0             2

如果这仍然太慢，那么我认为问题可能是resample存在于内的{}会减慢速度。也许您可以尝试用一个groupby调用按所有3个谓词分组。对于日期重采样，请尝试pd.Grouper。在

^{pr2}$

从性能上讲，这应该更快。在

性能

为了测试的目的，让我们尝试设置一个更通用的数据帧。在

# Setup
df_ = df.copy()
df1 = pd.concat([df_.reset_index()] * 100, ignore_index=True)
df = pd.concat([
        df1.replace({'Customer1': f'Customer{i}', 'Customer2': f'Customer{i+1}'}) 
        for i in range(1, 98, 2)], ignore_index=True) 
df = df.set_index('index')

df.shape
# (24500, 3)

%%timeit 
(df.groupby(["Name", "IncomeOutcome"])['Amount']
   .resample("M")
   .agg(['sum','size'])
   .rename({'sum':'Amount', 'size': 'MonthlyCount'}, axis=1)
   .reset_index(level=-1, drop=True)
   .reset_index())

%%timeit
(df.groupby(['Name', 'IncomeOutcome', pd.Grouper(freq='M')])['Amount']
   .agg([ ('Amount', 'sum'), ('MonthlyCount', 'size')])
   .reset_index(level=-1, drop=True)
   .reset_index())

1.71 s ± 85.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
24.2 ms ± 1.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

相关问题更多 >

编程相关推荐

热门问题

热门文章

有效地收集大Pandas的重新采样的日期时间集合

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >