我正在尝试创建一个数据框架,其中包含基于较大数据集(称为应用程序和分配)的12个月滚动切片的每日值条目。目前,我还没有使用测向滚动因为我觉得在这种情况下是不对的。我只是在一个日期索引中循环,对每个日期的12个月的较大数据集执行几个函数。然后将返回的值附加到每个变量的列表中,然后使用这些列表创建一个数据帧。这太慢了。你知道吗
我正在寻找一种方法来加速我的滚动数据帧的创建。你知道吗
此函数用于计算滚动窗口中的值:
def calculate_rolling(apps, assgs, date_index):
# dictionary containing lists of data - used later to create df
values = {'owners':[],
'successful_owners' :[],
'applications' :[],
'assignments':[],
'filled_assignments':[],
'sitters':[],
'successful_sitters':[]
}
for day in date_index:
twelve_months_prior = day - relativedelta(months=12)
app_view = apps.loc[str(twelve_months_prior):str(day.date())] # slice of applications df
assg_view = assgs.loc[str(twelve_months_prior):str(day.date())] # slice of assignments df
values['owners'].append(assg_view.ouser_id.nunique())
values['sitters'].append(assg_view.ouser_id.nunique())
values['applications'].append(app_view.request_id.count())
values['assignments'].append(assg_view.is_assignment_filled.sum())
values['filled_assignments'].append(assg_view.is_assignment_filled.sum())
values['successful_sitters'].append(assg_view[assg_view.is_assignment_filled ==1].suser_id.nunique())
values['successful_owners'].append(assg_view[assg_view.is_assignment_filled ==1].ouser_id.nunique())
return pd.DataFrame(data=values, index=date_index)
..在创建了日期范围索引之后,我这样称呼它:
# create index of dates
index = pd.date_range(start=start, end=applications.created_date.max())
# create df from values dictionary
rolling_data = calculate_rolling(applications, assignments, index)
%timeit现在给我23秒来处理calculate\u\u data()。这将导致我的问题时,使用这个在我的博凯仪表板。你知道吗
示例数据-应用程序:
request_id req_type assignment_id date_created last_modified oid sid ouser_id suser_id oconfirmed sconfirmed aid created_date
0 30682 app 42 2016-04-13 2016-04-13 828 2329 1360 4822 0 1 42.0 2016-04-13
1 5718 app 52 2016-03-17 2016-03-17 220 18435 339 27455 1 1 NaN NaT
2 5719 app 75 2016-03-17 2016-03-17 639 13645 1027 20691 1 1 75.0 2015-07-21
3 5720 app 245 2016-03-17 2016-03-17 2324 39096 5529 52883 1 1 NaN NaT
4 5721 app 262 2016-03-17 2016-03-17 1343 39089 2918 52876 1 1 262.0 2015-08-16
示例数据-分配:
aid created_date start_date end_date oid sid ouser_id suser_id is_assignment_filled
created_date
2010-12-18 1 2010-12-18 2010-12-18 2011-03-05 104 NaN 87 NaN False
2010-12-11 2 2010-12-11 2010-12-11 2011-01-02 108 NaN 93 NaN False
2011-08-12 3 2011-08-12 2011-08-12 2011-08-28 1220 NaN 1972 NaN False
2011-01-09 4 2011-01-09 2011-01-09 2011-05-11 323 NaN 482 NaN False
2010-12-28 7 2010-12-28 2010-12-28 2011-01-31 142 NaN 169 NaN False
目前没有回答
相关问题 更多 >
编程相关推荐