<p>我们使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.diff.html#pandas.Series.diff" rel="nofollow noreferrer">^{<cd1>}</a>来计算排序数据帧上每个<code>admit_time</code>组每个<code>id</code>组的差异,并选择任何<code>NaT</code>差异(即每个组的第一行)或差异大于30天的行。最后,我们删除辅助列<code>delta</code>:</p>
<pre><code>df['delta'] = df.sort_values(['id', 'admit_time']).groupby('id')['admit_time'].transform(lambda x: x.diff())
df = df[df.delta.isna() | (df.delta >= pd.Timedelta(days=30))].drop(columns='delta')
</code></pre>
<p>输出:</p>
<pre><code> id admit_time
0 30 2018-10-03
2 13 2017-11-01
3 13 2018-02-27
</code></pre>
<p><br/></p>
<p><strong>更新修改后的问题:</strong></p>
<p>按<code>['id','note']</code>分组,而不是只按<code>'id'</code>:</p>
<pre><code>df['delta'] = df.sort_values(['id', 'admit_time']).groupby(['id','note'])['admit_time'].transform(lambda x: x.diff())
df = df[df.delta.isna() | (df.delta >= pd.Timedelta(days=30))].drop(columns='delta')
</code></pre>
<p>结果:</p>
<pre><code> id admit_time note
0 30 2018-10-03 note_content1
1 30 2018-10-03 note_content2
4 13 2017-11-01 note_content1
5 13 2018-02-27 note_content1
6 13 2018-02-27 note_content2
</code></pre>