<p>IIUC,我们可以对两个数据帧进行笛卡尔积,然后过滤出精确的匹配,然后应用一些逻辑计算出最近的日期</p>
<p>最后,我们将把您的extact和非精确匹配加入到最终的数据帧中</p>
<pre><code>s = pd.merge(
df_sample.assign(key="var1"),
df_main.assign(key="var1").rename(columns={"Time": "TimeDelta"}).drop("Index", 1),
on="key",
how="outer",
).drop("key", 1)
extact_matches = s[s['Time'].eq(s['TimeDelta'])]
non_exact_matches_cart = s[~s['Time'].isin(extact_matches['Time'])]
non_exact_matches = non_exact_matches_cart.assign(
delta=(non_exact_matches_cart["Time"] - non_exact_matches_cart["TimeDelta"])
/ np.timedelta64(1, "D")
).query("delta >= 0").sort_values(["Time", "delta"]).drop_duplicates(
"Time", keep="first"
).drop('delta',1)
</code></pre>
<hr/>
<p>上面的变量有很多,但本质上,我们是在寻找时间上的差异,消除未来的任何差异,并删除保留过去最接近日期的值</p>
<pre><code>df = pd.concat([extact_matches, non_exact_matches], axis=0).sort_values("Time").rename(
columns={"TimeDelta": "closest_time", "Actual": "closest val"}
)
print(df)
Index Time Pred closest_time closest val
0 1 2020-06-01 100 2020-06-01 90
3 2 2020-06-02 -200 2020-06-01 90
7 3 2020-06-03 300 2020-06-03 280
10 4 2020-06-04 -400 2020-06-03 280
13 5 2020-06-05 -500 2020-06-03 280
17 6 2020-06-06 600 2020-06-06 650
</code></pre>