<p>试试这个:</p>
<pre class="lang-py prettyprint-override"><code>>>> import pandas as pd
>>> import numpy as np
>>> df=df.sort_values(by=["id", "admit_time"]) #in case your data is not sorted
>>> df_2=df.join(df.groupby("id").min(), on="id", how="left", rsuffix="_min")
>>> df_2["time_diff"]=np.where(df_2["id"]==df_2["id"].shift(), (pd.to_datetime(df_2["admit_time"])-pd.to_datetime(df_2["admit_time"].shift())).astype('timedelta64[D]'), 0)
>>> df_2
admit_time id admit_time_min time_diff
0 2018-10-03 30 2018-10-03 0.0
1 2018-10-29 30 2018-10-03 26.0
2 2017-11-01 13 2017-11-01 0.0
3 2018-02-27 13 2017-11-01 118.0
>>> df_2[(df_2["admit_time"]==df_2["admit_time_min"]) | (df_2["time_diff"]>=30)]
admit_time id admit_time_min time_diff
0 2018-10-03 30 2018-10-03 0.0
2 2017-11-01 13 2017-11-01 0.0
3 2018-02-27 13 2017-11-01 118.0
</code></pre>
<p>2个注意事项:</p>
<p>(1)您需要首先按<code>id, admit_time</code>对数据进行排序</p>
<p>(2)我没有找到等价于<code>dense_rank</code>-所以它是做正规的<code>rank</code></p>