<p>这里有一个选择。您正在添加月份,因此我们实际上可以通过以矢量化方式处理整数来计算新年/月/日,然后根据这些y/m/d组合创建日期时间:</p>
<pre><code>def f_proposed(df):
z = df.copy()
z = z.reset_index()
# repeat xdate as many times as the number of periods
z = z.loc[np.repeat(z.index, z['periods'])]
# calculate k number of months to add
z['k'] = z.groupby(level=0).cumcount() * z['interval']
# calculate new year/month/day and convert to datetime
z['year'] = (z['xdate'].dt.year * 12 + z['xdate'].dt.month - 1 + z['k']) // 12
z['month'] = (z['xdate'].dt.month - 1 + z['k']) % 12 + 1
# clip day to days_in_month
z['days_in_month'] = pd.to_datetime(
z['year'].astype(str)+'-'+z['month'].astype(str)+'-01').dt.days_in_month
z['day'] = np.clip(z['xdate'].dt.day, 0, z['days_in_month'])
z['sdates'] = pd.to_datetime(z[['year', 'month', 'day']])
# drop temporary columns
z = z.set_index('index').drop(columns=['k', 'year', 'month', 'day', 'days_in_month'])
return z
</code></pre>
<p>为了将性能与原始数据进行比较,我生成了一个包含10000行的测试数据集</p>
<p>以下是我的计时(10公里加速约23倍):</p>
<pre><code>%timeit f_proposed(z)
82.7 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit f_original(z)
1.92 s ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>
<p>另外,对于170K,在我的机器上使用<code>f_proposed</code>大约需要1.39秒,使用<code>f_original</code>大约需要33.6秒</p>