<p>我将原始数据帧修改为:</p>
<pre><code> itemId sellerId effectiveDate
19752572585893 31280 2005-12-31
19752572585893 31280 2006-02-28
19752592585894 31280 2008-01-31
19752592585894 5407 2007-07-31
19752592585894 5407 2008-03-31
19752592585894 5407 2008-01-31
</code></pre>
<p>从那里我筛选出每个<code>itemId</code>的最近一年:</p>
<pre><code>df['effectiveDate'] = pd.to_datetime(df['effectiveDate'])
filtered = df[df.groupby(by=['itemId']).apply(lambda g:
g['effectiveDate'] >=
g['effectiveDate'].max() -
pd.Timedelta(days=365)).values]
</code></pre>
<p>然后我将<code>sellerId</code>组合成如下:</p>
<pre><code>filtered.groupby(by=['itemId'])['sellerId'].agg(lambda x: x.unique().tolist())
</code></pre>
<p>剩下的就是获取最长日期,并将其连接回筛选和组合的数据:</p>
<pre><code>max_dates = filtered.groupby(by=['itemId'])['effectiveDate'].max()
modified_df = pd.concat([compressed,max_dates],axis=1)
</code></pre>
<p>结果:</p>
<pre><code> sellerId effectiveDate
itemId
19752572585893 [31280] 2006-02-28
19752592585894 [31280, 5407] 2008-03-31
</code></pre>