查找下一个非NaN值的时间戳

2024-09-20 03:47:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下所示的数据框:

A    B   datetime
10  NaN  12-03-2020  04:43:11
NaN 20   13-03-2020  04:43:11
NaN NaN  14-03-2020  04:43:11
NaN NaN  15-03-2020  04:43:11
NaN NaN  16-03-2020  04:43:11
NaN 50   17-03-2020  04:43:11
20  NaN  18-03-2020  04:43:11
NaN 30   19-03-2020  04:43:11
NaN NaN  20-03-2020  04:43:11
30  30   21-03-2020  04:43:11
40  NaN  22-03-2020  04:43:11
NaN 10   23-03-2020  04:43:11

这里的逻辑是,如果列A是notna(),列B的下一个最近的非NaN值是notna(),则返回列B的时间戳

对于此逻辑,我使用以下代码:

df['cond1'] = df['A'].notna()

for t in range(1,5): 
    if df['cond1'] == True:
        df['next_ts'] = np.where(df['B'].shift(-t).notna(),df['datetime'].shift(-t),np.datetime64('NaT'))
    else:
        None

对于上述代码,我得到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所需的输出如下所示:

A    B   datetime                next_ts
10  NaN  12-03-2020  04:43:11    NaN 
NaN 20   13-03-2020  04:43:11    NaN 
NaN NaN  14-03-2020  04:43:11    NaN 
NaN NaN  15-03-2020  04:43:11    NaN 
NaN NaN  16-03-2020  04:43:11    NaN 
NaN 50   17-03-2020  04:43:11    NaN 
20  NaN  18-03-2020  04:43:11    19-03-2020  04:43:11
NaN 30   19-03-2020  04:43:11    NaN
NaN NaN  20-03-2020  04:43:11    NaN 
30  30   21-03-2020  04:43:11    22-03-2020  04:43:11
40  NaN  22-03-2020  04:43:11    23-03-2020  04:43:11
NaN 10   23-03-2020  04:43:11    NaN

Someone please help me in achieving my logic. 

Tags: 数据代码indffordatetimeshiftnp
1条回答
网友
1楼 · 发布于 2024-09-20 03:47:37
import datetime

import numpy as np
import pandas as pd


df = pd.DataFrame({
    "A": [10, None, None, None, None, None, 20, None, None, 30, 40, None],
    "B": [None, 20, None, None, None, 50, None, 30, None, 30, None, 10],
    "datetime": [datetime.datetime(2020, 3, 12, 4, 43, 11) + datetime.timedelta(days=i) for i in range(12)]
}).astype({"A": "Int64", "B": "Int64"})

df["next_ts"] = np.where(df.B.notnull(), df.datetime, None).astype("datetime64[ns]")  # puts the timestamp where B is set
df["next_ts"] = df.next_ts.fillna(method="bfill")  # propagates the values of next_ts backward where they are null
df["next_ts"] = np.where(df.A.notnull(), df.next_ts, None).astype("datetime64[ns]")  # eliminates the values of next_ts where A is null

print(df)



       A     B            datetime             next_ts
0     10  <NA> 2020-03-12 04:43:11 2020-03-13 04:43:11
1   <NA>    20 2020-03-13 04:43:11                 NaT
2   <NA>  <NA> 2020-03-14 04:43:11                 NaT
3   <NA>  <NA> 2020-03-15 04:43:11                 NaT
4   <NA>  <NA> 2020-03-16 04:43:11                 NaT
5   <NA>    50 2020-03-17 04:43:11                 NaT
6     20  <NA> 2020-03-18 04:43:11 2020-03-19 04:43:11
7   <NA>    30 2020-03-19 04:43:11                 NaT
8   <NA>  <NA> 2020-03-20 04:43:11                 NaT
9     30    30 2020-03-21 04:43:11 2020-03-21 04:43:11
10    40  <NA> 2020-03-22 04:43:11 2020-03-23 04:43:11
11  <NA>    10 2020-03-23 04:43:11                 NaT

相关问题 更多 >