数据帧中的内容:
email user_name sessions ymo
a@a.com JD 1 2015-03-01
a@a.com JD 2 2015-05-01
我需要的是:
email user_name sessions ymo
a@a.com JD 0 2015-01-01
a@a.com JD 0 2015-02-01
a@a.com JD 1 2015-03-01
a@a.com JD 0 2015-04-01
a@a.com JD 2 2015-05-01
a@a.com JD 0 2015-06-01
a@a.com JD 0 2015-07-01
a@a.com JD 0 2015-08-01
a@a.com JD 0 2015-09-01
a@a.com JD 0 2015-10-01
a@a.com JD 0 2015-11-01
a@a.com JD 0 2015-12-01
ymo
列是pd.Timestamp
的:
all_ymo
[Timestamp('2015-01-01 00:00:00'),
Timestamp('2015-02-01 00:00:00'),
Timestamp('2015-03-01 00:00:00'),
Timestamp('2015-04-01 00:00:00'),
Timestamp('2015-05-01 00:00:00'),
Timestamp('2015-06-01 00:00:00'),
Timestamp('2015-07-01 00:00:00'),
Timestamp('2015-08-01 00:00:00'),
Timestamp('2015-09-01 00:00:00'),
Timestamp('2015-10-01 00:00:00'),
Timestamp('2015-11-01 00:00:00'),
Timestamp('2015-12-01 00:00:00')]
不幸的是,这个答案:Adding values for missing data combinations in Pandas并不好,因为它为现有的ymo
值创建了重复项。你知道吗
我试过这样的方法,但是非常慢:
for em in all_emails:
existent_ymo = fill_ymo[fill_ymo['email'] == em]['ymo']
existent_ymo = set([pd.Timestamp(datetime.date(t.year, t.month, t.day)) for t in existent_ymo])
missing_ymo = list(existent_ymo - all_ymo)
multi_ind = pd.MultiIndex.from_product([[em], missing_ymo], names=col_names)
fill_ymo = sessions.set_index(col_names).reindex(multi_ind, fill_value=0).reset_index()
我尝试用
periods
创建更通用的解决方案:如果需要} :
datetimes
使用^{带日期时间的解决方案:
reindex
ffill
和bfill
列['email', 'user_name']
fillna(0)
列'sessions'
相关问题 更多 >
编程相关推荐