<p>在电子邮件中添加下一个日期列</p>
<pre><code>df_emails["NextDateSent"] = df_emails.groupby("CustID").shift(-1)
</code></pre>
<p>对<code>merge_asof</code>排序,然后合并到最近的以创建行程查找表</p>
<pre><code>df_emails = df_emails.sort_values("DateSent")
df_trips = df_trips.sort_values("TripDate")
df_lookup = pd.merge_asof(df_trips, df_emails, by="CustID", left_on="TripDate",right_on="DateSent", direction="backward")
</code></pre>
<p>为所需数据聚合查找表。你知道吗</p>
<pre><code>df_lookup = df_lookup.loc[:, ["CustID", "DateSent", "TotalSpend"]].groupby(["CustID", "DateSent"]).agg(["count","sum"])
</code></pre>
<p>左键将其连接回电子邮件表。你知道吗</p>
<pre><code>df_merge = df_emails.join(df_lookup, on=["CustID", "DateSent"]).sort_values("CustID")
</code></pre>
<p>我之所以选择将nan保留为nan,是因为我不喜欢填充默认值(如果愿意,您可以稍后再填充,但如果您提前设置默认值,则无法轻松区分已存在的事物和未存在的事物)</p>
<pre><code> CustID DateSent NextDateSent (TotalSpend, count) (TotalSpend, sum)
0 2 2018-01-20 2018-02-19 2.0 125.0
1 2 2018-02-19 2018-03-31 1.0 250.0
2 2 2018-03-31 NaT NaN NaN
3 4 2018-01-10 2018-02-26 NaN NaN
4 4 2018-02-26 NaT 2.0 200.0
5 5 2018-02-01 2018-02-07 NaN NaN
6 5 2018-02-07 NaT NaN NaN
</code></pre>