<p>我假设您检查了<code>value</code>列和<code>ID</code>列上的重复项,并进一步检查了<code>date</code>列的<code>date</code></p>
<pre><code>df.loc[df.assign(d=df.date.dt.date).duplicated(['value','ID', 'd']), 'value'] = np.nan
Out[269]:
date value ID
0 2019-01-01 00:00:00 10.0 Jackie
1 2019-01-01 01:00:00 NaN Jackie
2 2019-01-01 02:00:00 NaN Jackie
3 2019-01-01 03:00:00 NaN Jackie
4 2019-09-01 02:00:00 12.0 Zoop
5 2019-09-01 03:00:00 NaN Zoop
6 2019-09-01 04:00:00 NaN Zoop
7 2019-09-01 05:00:00 NaN Zoop
</code></pre>
<p>正如@Trenton所建议的,您可以使用<code>pd.NA</code>来避免导入numpy</p>
<p>(<em>注意</em>:as@rafaelc sugguest:下面是解释<code>pd.NA</code>和<code>np.nan</code>{a1}之间细节差异的链接)</p>
<pre><code>df.loc[df.assign(d=df.date.dt.date).duplicated(['value','ID', 'd']), 'value'] = pd.NA
Out[273]:
date value ID
0 2019-01-01 00:00:00 10 Jackie
1 2019-01-01 01:00:00 <NA> Jackie
2 2019-01-01 02:00:00 <NA> Jackie
3 2019-01-01 03:00:00 <NA> Jackie
4 2019-09-01 02:00:00 12 Zoop
5 2019-09-01 03:00:00 <NA> Zoop
6 2019-09-01 04:00:00 <NA> Zoop
7 2019-09-01 05:00:00 <NA> Zoop
</code></pre>