Pandas:比较Datetime数组上的Datetime列

2024-05-18 10:18:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在学习熊猫,特别是现在的Datetimes。我正在寻找一种通过Datetime列选择行的方法。如果Datetime列值位于数组spacexclonx值之间的范围内

两个阵列:

clonx = array(['2019-08-14T23:32:00.000000000', '2019-08-14T23:35:00.000000000',
       '2019-08-14T23:35:00.000000000', ...,
       '2020-05-24T14:55:00.000000000', '2020-05-24T15:03:00.000000000',
       '2020-05-25T12:09:00.000000000'], dtype='datetime64[ns]')

spacex = array(['2019-08-14T23:27:00.000000000', '2019-08-14T23:30:00.000000000',
   '2019-08-14T23:30:00.000000000', ...,
   '2020-05-24T14:50:00.000000000', '2020-05-24T14:58:00.000000000',
   '2020-05-25T12:04:00.000000000'], dtype='datetime64[ns]')

专栏:

    first['datim']

0      2019-08-14 23:26:00
1      2019-08-14 23:26:00

2      2019-08-14 23:27:00
3      2019-08-14 23:30:00
4      2019-08-14 23:30:00
               ...        
5101   2020-05-25 20:48:00
5102   2020-05-25 20:49:00
5103   2020-05-26 13:52:00
5104   2020-05-26 13:52:00
5105   2020-05-26 14:22:00
Name: datim, Length: 3172, dtype: datetime64[ns]

如何从first['datim']列中获取介于spacexclonx的日期时间值

大概是这样的:

start_date = spacex[i]
end_date = clonx[i]
for i in range:
    final = (first['datim'] >= start_date) & (first['datim'] <= end_date)
result final

或者使用beween_time,但找不到一种方法使其与数组一起工作

谢谢你的时间


Tags: 方法datetimedate时间数组arraystartend
2条回答

您可以使用apply将一列添加到数据帧中,这是基于与两个日期时间数组相比的“datim”日期时间。这不会很好地处理大量数据,但对您来说可能没问题。例如,这将告诉您时间是否在日期时间对的any之间(如@Pygirl的答案):

def between_any(time):
    for s,c in zip(spacex, clonx):
        if (time  >= s) and (time <= c):
            return True
    return False

df['Between Any'] = df['datim'].apply(between_any)

或者,您可以获得该值所处日期对的索引:

def between_index(time):
    output = []
    for i in range(len(spacex)):
        if (time  >= spacex[i]) and (time <= clonx[i]):
            output.append(i)
    return output if output else np.nan

df['Between Indices'] = df['datim'].apply(between_index)

或者,您可以实际获取值介于以下时间之间的时间戳:

def between_values(time):
    output = []
    for s,c in zip(spacex, clonx):
        if (time  >= s) and (time <= c):
            output.append((s,c))
    return output if output else np.nan

df['Between Values'] = df['datim'].apply(between_values)

以下是根据您的数据得出的结果:

In[0]: df

Out[0]:
                   datim
0    2019-08-14 23:26:00
1    2019-08-14 23:26:00
2    2019-08-14 23:27:00
3    2019-08-14 23:30:00
4    2019-08-14 23:30:00
5101 2020-05-25 20:48:00
5102 2020-05-25 20:49:00
5103 2020-05-26 13:52:00
5104 2020-05-26 13:52:00
5105 2020-05-26 14:22:00

In[1]:

clonx = pd.Series(['2019-08-14T23:32:00.000000000', '2019-08-14T23:35:00.000000000','2019-08-14T23:35:00.000000000','2020-05-24T14:55:00.000000000', '2020-05-24T15:03:00.000000000','2020-05-25T12:09:00.000000000'])

spacex = pd.Series(['2019-08-14T23:27:00.000000000', '2019-08-14T23:30:00.000000000','2019-08-14T23:30:00.000000000','2020-05-24T14:50:00.000000000', '2020-05-24T14:58:00.000000000','2020-05-25T12:04:00.000000000'])

clonx = pd.to_datetime(clonx)
spacex = pd.to_datetime(spacex)

df['Between Any'] = df['datim'].apply(between_any)
df['Between Indices'] = df['datim'].apply(between_index)
df['Between Values'] = df['datim'].apply(between_values)

df

Out[1]:

                   datim  Between Any Between Indices  \
0    2019-08-14 23:26:00        False             NaN   
1    2019-08-14 23:26:00        False             NaN   
2    2019-08-14 23:27:00         True             [0]   
3    2019-08-14 23:30:00         True       [0, 1, 2]   
4    2019-08-14 23:30:00         True       [0, 1, 2]   
5101 2020-05-25 20:48:00        False             NaN   
5102 2020-05-25 20:49:00        False             NaN   
5103 2020-05-26 13:52:00        False             NaN   
5104 2020-05-26 13:52:00        False             NaN   
5105 2020-05-26 14:22:00        False             NaN   

                                         Between Values  
0                                                   NaN  
1                                                   NaN  
2          [(2019-08-14 23:27:00, 2019-08-14 23:32:00)]  
3     [(2019-08-14 23:27:00, 2019-08-14 23:32:00), (...  
4     [(2019-08-14 23:27:00, 2019-08-14 23:32:00), (...  
5101                                                NaN  
5102                                                NaN  
5103                                                NaN  
5104                                                NaN  
5105                                                NaN  

但这不是更好的解决方案:

datelist = []
for i in range(len(first.datim)):
    for j in range(len(clonx)):
        if (spacex[j]<=first.datim[i]) and (first.datim[i]<=clonx[j]):
            datelist.append(first.datim[i])
print(set(datelist))

{Timestamp('2019-08-14 23:30:00'), Timestamp('2019-08-14 23:27:00')}

相关问题 更多 >