<p>这是一个直接的实现,它只是建立一个系列,
<code>result</code>,其索引具有分钟频率,然后在
<code>df</code>(使用<code>df.itertuples</code>)并为每个添加适当的功率
相关间隔中的行:</p>
<pre><code>import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Duration (Hours)': [2.233333333, 1.8999999999999999, 7.2166666670000001, 3.4500000000000002, 1.6000000000000001, 1.6000000000000001], 'Start Date': ['1/3/2016', '1/3/2016', '1/4/2016', '1/4/2016', '1/4/2016', '1/4/2016'], 'Start Time': ['12:28:00 PM', '4:55:00 PM', '6:47:00 PM', '7:00:00 AM', '7:26:00 AM', '7:32:00 AM'], 'Usage(kWh)': [6.2300000000000004, 11.449999999999999, 11.93, 9.4499999999999993, 7.3300000000000001, 4.54]} )
df['duration'] = pd.to_timedelta(df['Duration (Hours)'], unit='H')
df['start_date'] = pd.to_datetime(df['Start Date'] + ' ' + df['Start Time'])
df['end_date'] = df['start_date'] + df['duration']
df['power (kW/min)'] = df['Usage(kWh)']/(df['Duration (Hours)']*60)
df = df.drop(['Start Date', 'Start Time', 'Duration (Hours)'], axis=1)
result = pd.Series(0,
index=pd.date_range(df['start_date'].min(), df['end_date'].max(), freq='T'))
power_idx = df.columns.get_loc('power (kW/min)')+1
for row in df.itertuples():
result.loc[row.start_date:row.end_date] += row[power_idx]
# The sum of the usage over 15 minute windows is computed using the `resample/sum` method:
usage = result.resample('15T').sum()
usage.plot(kind='line', label='usage')
plt.legend(loc='best')
plt.show()
</code></pre>
<h2><a href="https://i.stack.imgur.com/2lvUa.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/2lvUa.png" alt="enter image description here"/></a></h2>
<p><strong>关于性能的注意事项</strong>:循环遍历<code>df</code>的行不是很好
特别是当<code>len(df)</code>很大时。为了获得更好的性能,您可能需要
<a href="https://stackoverflow.com/a/31773404/190597">more clever method</a>,它处理
以矢量化方式“一次”所有行:</p>
^{pr2}$
<hr/>
<p>当<code>len(df)</code>等于1000,<code>using_cumsum</code>比<code>using_loop</code>快10倍以上:</p>
<pre><code>In [117]: %timeit using_loop(df)
1 loop, best of 3: 545 ms per loop
In [118]: %timeit using_cumsum(df)
10 loops, best of 3: 52.7 ms per loop
</code></pre>