我有两个数据帧。DF1包含以下内容:
User | Time interval
User01 | [01/01/2014 08:12:00, 01/01/2014 08:13:43]
User02 | [01/03/2014 07:21:44, 01/04/2014 01:07:01]
DF 2包含以下事件:
User | Time | Value
User01 | 01/03/2014 04:11:00 | 9
User01 | 01/01/2014 08:10:00 | 12
User02 | 01/03/2014 09:11:00 | 3
User02 | 01/02/2014 011:10:00 | 21
我想在DF1中添加3列,根据DF2中的事件,包含时间间隔内每个用户的平均值、标准偏差和最大值。在
所以最终结果应该是这样的:
User | Time interval | Mean | Max | StDev
User01 | [01/01/2014 08:12:00, 01/01/2014 08:13:43] | NaN | NaN | Nan
User02 | [01/03/2014 07:21:44, 01/04/2014 01:07:01] | 3 | 3 | 0
如果我的桌子很大,有什么有效的方法可以做到这一点?对于基于另一个数据帧的时间间隔,是否存在某种“groupby”函数?在
代码:
import pandas as pd
DF1 = pd.DataFrame({'User' : pd.Series(["User01", "User02"], index=['1', '2']), 'Time start' : pd.Series(["01/01/2014 08:12:00", "01/03/2014 07:21:44"], index=['1', '2']),'Time end' : pd.Series(["01/01/2014 08:13:43", "01/04/2014 01:07:01"], index=['1', '2'])})
DF2 = pd.DataFrame({'User' : pd.Series(["User01","User01","User02", "User02"], index=['1', '2','3','4']), 'Time' : pd.Series(["01/03/2014 04:11:00", "01/01/2014 08:10:00","01/03/2014 09:11:00","01/02/2014 011:10:00"], index=['1', '2','3', '4']),'Value' : pd.Series([9,12,3,21], index=['1', '2','3','4'])})
DF3 = pd.DataFrame({'User' : pd.Series(["User01", "User02"], index=['1', '2']), 'Time start' : pd.Series(["01/01/2014 08:12:00", "01/03/2014 07:21:44"], index=['1', '2']),'Time end' : pd.Series(["01/01/2014 08:13:43", "01/04/2014 01:07:01"], index=['1', '2']),'Mean' : pd.Series(["Nan", 3], index=['1', '2']),'Max' : pd.Series(["Nan", 3], index=['1', '2']),'StDev' : pd.Series(["Nan", 0], index=['1', '2'])})
首先,合并DF1和DF2
如果时间介于时间开始和时间结束之间,则创建一个指示符(“keep”)
^{pr2}$输出:
现在只保留keep=1的行
现在使用groupby对df进行聚集
输出:
与df4合并
输出:
相关问题 更多 >
编程相关推荐