我试图为以下问题找到一种有效的方法:
两个数据帧,每个帧包含以下数据:
第一个:id, date, value
样本数据:
id, date, value
f130,200701,0.016196
f130,200702,-0.027798
f130,200703,-0.014868
f130,200704,0.017801
f130,200705,-0.032700
f130,200706,0.049529
f130,200707,0.011610
f130,200708,-0.008145
f130,200709,-0.001493
f130,200710,0.009719
f130,200711,-0.007775
f130,200712,-0.007835
f131,200701,0.044754
f131,200702,0.004679
f131,200703,-0.011824
f131,200704,0.007252
f131,200705,0.029877
f131,200706,0.001748
f131,200707,0.001047
f131,200708,-0.003137
f131,200709,0.001748
f131,200710,0.006632
f131,200711,-0.012136
f131,200712,0.004914
第二个:id_2, date, value
样本数据:
^{pr2}$我需要的是,所有id & id_2
对的两个value
列之间的滚动窗口关联(滚动date
列)
基本上,我的输出应该是:"id vs id_2", date, corr
因此,对于d1 vs f130
,对于200706
,我从200706
开始计算d1
和{200706
开始计算。所有对都一样。
预期产量:
id_pair, date, value
d1_f130,200706,-0.375238392
d1_f130,200707,-0.667154011
d1_f130,200708,-0.636064899
d1_f130,200709,-0.672029012
d1_f130,200710,-0.653719992
d1_f130,200711,-0.802893705
d1_f130,200712,-0.03120143
d1_f131,200706,0.870717009
d1_f131,200707,0.61076152
d1_f131,200708,0.400632396
d1_f131,200709,0.05064842
d1_f131,200710,0.087102168
d1_f131,200711,-0.012306865
d1_f131,200712,0.05170204
d2_f130,200706,-0.170979922
d2_f130,200707,-0.15363222
d2_f130,200708,-0.089709021
d2_f130,200709,-0.227564277
d2_f130,200710,-0.252391258
d2_f130,200711,0.94878745
d2_f130,200712,0.619029635
d2_f131,200706,0.358385975
d2_f131,200707,0.952074283
d2_f131,200708,0.930805345
d2_f131,200709,0.919101445
d2_f131,200710,0.904473885
d2_f131,200711,0.47080201
d2_f131,200712,0.640334152
使用for循环遍历id和日期需要几天时间。。。(身份证号码~15000,证件号码2~300,日期~300)
有什么想法吗?在
假设您有两个数据帧,如下所示:
然后,您可以将两者合并为一个
^{pr2}$df
,如:现在按ID分组并应用滚动关联:
注:
ids
是{更新:
您可以重命名示例的标题,以符合以下答案:
您可以更改
ids
以适合预期的输出替换'ids':[n]*len(g.date)
签署人:
'ids':['_'.join(n)]*len(g.date)
例如。在
相关问题 更多 >
编程相关推荐