<p>让我们试试<code>assign</code>和<code>dt.round</code></p>
<p>我们的想法是按距离小时的距离对值进行排序,然后对值进行排序,并保留第一个实例</p>
<pre><code>import pandas as pd
import numpy as np
df2 = (
df.assign(
hour_diff=(df["refTime"].dt.round("H") - df["refTime"]) / np.timedelta64(1, "m")
)
.sort_values("hour_diff")
.drop_duplicates(subset=["name"], keep="first")
.drop("hour_diff", axis=1)
)
print(df2)
target refTime name latitude longitude
5 5.0 2020-05-31 23:00:00 SGES -25.450001 -54.849998
6 5.0 2020-05-31 23:00:00 SGAS -25.250000 -57.520000
8 8.0 2020-05-31 23:00:00 NFFN -17.750000 177.449997
9 7.0 2020-05-31 23:00:00 SBPS -16.430000 -39.080002
7 5.0 2020-05-31 22:59:00 SUMU -34.830002 -56.000000
2 6.0 2020-05-31 22:56:00 YMAY -36.060001 146.929993
1 6.0 2020-05-31 22:51:00 YWGT -36.419998 146.300003
10 7.0 2020-05-31 22:50:00 NSTU -14.330000 -170.720001
0 5.0 2020-05-31 22:48:00 YMLT -41.529999 147.190002
4 3.0 2020-05-31 22:46:00 FACT -33.990002 18.600000
</code></pre>
<p>距离度量将如下所示:</p>
<pre><code>df.assign(
hour_diff=(df["refTime"].dt.round("H") - df["refTime"]) / np.timedelta64(1, "m")
)
target refTime name latitude longitude hour_diff
0 5.0 2020-05-31 22:48:00 YMLT -41.529999 147.190002 12.0
1 6.0 2020-05-31 22:51:00 YWGT -36.419998 146.300003 9.0
2 6.0 2020-05-31 22:56:00 YMAY -36.060001 146.929993 4.0
3 5.0 2020-05-31 22:47:00 SUMU -34.830002 -56.000000 13.0 # we drop this
4 3.0 2020-05-31 22:46:00 FACT -33.990002 18.600000 14.0
5 5.0 2020-05-31 23:00:00 SGES -25.450001 -54.849998 0.0
6 5.0 2020-05-31 23:00:00 SGAS -25.250000 -57.520000 0.0
7 5.0 2020-05-31 22:59:00 SUMU -34.830002 -56.000000 1.0 # we keep this one
8 8.0 2020-05-31 23:00:00 NFFN -17.750000 177.449997 0.0
9 7.0 2020-05-31 23:00:00 SBPS -16.430000 -39.080002 0.0
10 7.0 2020-05-31 22:50:00 NSTU -14.330000 -170.720001 10.0
</code></pre>