如果一个表中的时间戳介于另一个表中的两个时间戳之间,则使用Python将其递增1

2024-07-02 13:41:59 发布

您现在位置:Python中文网/ 问答频道 /正文

问题摘要:

我想计算一下在整个日历年中,在一天中的任何一分钟响应的救护车数量。 生成两个数据帧;第一个是救护车的紧急响应,显示紧急情况的开始时间戳和救护车紧急情况的结束时间戳。这些数据来自我们的数据库。例如,一辆救护车在2020-01-01 00:30:17响应心脏骤停,而救护车在2020-01-01 00:38:05.000从该响应中清除。让我们将此数据帧称为“紧急事件”

第二个数据帧采用紧急_事件的最小值和最大值。它使用最小和最大时间戳作为另一个数据帧的起点和终点来生成数据帧。它从起点到终点递增一分钟,并生成一个零作为工作卡车数量的占位符。让我们将这第二个数据帧称为“重合”,因为我们要计算在这一分钟时间范围内同时工作的救护车的数量

换句话说,第一个紧急事件开始于“2020-01-01 00:00:28”,因此“一致”事件表将采用该值并增加一分钟,直到最后一个紧急事件结束时间戳。例如,“重合”表如下所示:

calendar_timestamp      TrucksWorking

2020-01-01 00:00:28              0

2020-01-01 00:01:28              0

2020-01-01 00:02:28              0

2020-01-01 00:03:28              0

2020-01-01 00:04:28              0

2020-01-01 00:05:28              0

......

请注意它是如何增加一分钟的,并且有一个占位符0表示救护车的工作数量

现在有两个数据帧:“紧急事件”和“重合”表。 该计划的目标是使用“一致”表的第一次观察,并根据“紧急事件”表的每一行进行评估在“紧急事件”的开始时间和结束时间之间是否出现“一致”观测的时间戳?如果为真,则将TrucksWorking值增加1。循环每次“一致”观察,并评估其是否处于任何“紧急事件”之间,如果为真,则增加1

在程序结束时,这将生成一分钟增量的数据帧和当时工作的救护车数量。使用这些数据,我可以统计分析在任何给定时间工作的救护车数量,甚至可以按小时、工作日、日间/夜间等进行解析。这是非常强大的信息

但是我被困在逻辑上,我需要你的帮助。具体地说,我不知道如何在“emergency_events”表中添加“Concurrent”表的时间戳时添加1

我尝试过的

for each in coincident.calendar_timestamp:
    if (coincident[coincident['calendar_timestamp']] >= emergency_events[emergency_events['StartTime']] & coincident[coincident['calendar_timestamp']] <= emergency_events[emergency_events['EndTime']]):
        coincident[coincident['TrucksWorking']] = coincident[coincident['TrucksWorking']] + 1
    else:
        coincident[coincident['TrucksWorking']]

我也尝试过:

# =============================================================================
# I have attempted the following
# the following code returns an error message
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# =============================================================================
## for each in coincident.calendar_timestamp:
##     if (coincident[coincident['calendar_timestamp'].between(starting_point, ending_point)]):
##         coincident[coincident['TrucksWorking']] = coincident[coincident['TrucksWorking']] + 1
##     else:
##         coincident[coincident['TrucksWorking']]

# =============================================================================
# I have attempted the following
# a dead end code that I cannot make work
# df = coincident[coincident['calendar_timestamp'].between(starting_point, ending_point)]
# print(df.head(n = 5))
# =============================================================================

# =============================================================================
# I have attempted the following but it will not work
# another dead end code
# for timestamp in coincident_events.calendar:
#     print(coincident_events.calendar.query('coincident_events.calendar >= emergency_events.starting_point and coincident_events.calendar <= emergency_events.ending_point'))
# =============================================================================

显示我的代码:

# -*- coding: utf-8 -*-
# Python 3.7 Anaconda distribution
import pandas as pd
import datetime

# =============================================================================
# Step 1: Read in the ambulance runs with a starting and ending time values
# call this dataframe "emergency_events"
# =============================================================================

# the following array is a small sample when an ambulance starts a call and when it ends a call
data = [['2020-01-01 00:00:28.000','2020-01-01 00:35:28.987']
        , ['2020-01-01 00:02:34.000','2020-01-01 01:05:13.540']
        , ['2020-01-01 00:03:57.000','2020-01-01 01:14:44.537']
        , ['2020-01-01 00:06:17.000','2020-01-01 01:26:52.087']
        , ['2020-01-01 00:13:20.000','2020-01-01 01:17:31.310']
        , ['2020-01-01 00:14:01.000','2020-01-01 01:57:28.343']
        , ['2020-01-01 00:16:11.000','2020-01-01 00:39:34.967']
        , ['2020-01-01 00:22:03.000','2020-01-01 01:46:40.037']
        , ['2020-01-01 00:23:07.000','2020-01-01 00:49:25.890']
        , ['2020-01-01 00:23:19.000','2020-01-01 01:26:39.920']
        , ['2020-01-01 00:30:17.000','2020-01-01 00:38:05.000']] 

#convert the array to a pandas data frame
emergency_events = pd.DataFrame(data, columns = ['StartTime', 'EndTime'])

#convert the string values to date time values
emergency_events['StartTime'] = pd.to_datetime(emergency_events['StartTime'])
emergency_events['EndTime'] = pd.to_datetime(emergency_events['EndTime'])

# =============================================================================
# Step 2 Create a calendar of date time stamps incremented by 1 minute using the ambulance runs min/max values
# call this dataframe "coincident"
# =============================================================================

## establish a starting value based on the first ambulance event
starting_point = emergency_events.StartTime.min()
print(starting_point)
## establish an ending value based on the final ambulance call ending time.
ending_point = emergency_events.EndTime.max()
print(ending_point)
## create a range of time stamps incremented by 1 minute from starting point to ending point
days = pd.date_range(starting_point, ending_point, freq='min')

## create a pandas dataframe with two columns: calendar for time stamps and a place holder of 0 for trucks working 
coincident = pd.DataFrame({'calendar_timestamp': days, 'TrucksWorking': 0})
## print it out to verify the data
print(coincident.head(n = 5))

# =============================================================================
# Step 3 --- now for the difficult part
# if a "coincident" time stamp is between a start and end time of an emergency_event
# increment the TrucksWorking column by 1
# loop through every "coincident" observation and test if it is between a start and an end of an "emergency_event"
# =============================================================================
for each in coincident.calendar_timestamp:
    if (coincident[coincident['calendar_timestamp']] >= emergency_events[emergency_events['StartTime']] & coincident[coincident['calendar_timestamp']] <= emergency_events[emergency_events['EndTime']]):
        coincident[coincident['TrucksWorking']] = coincident[coincident['TrucksWorking']] + 1
    else:
        coincident[coincident['TrucksWorking']]

## at the end of this program it should return a calendar of date time stamps with 
## the number of ambulances at work during that one minute interval.
## this information can be used for data modeling.

# =============================================================================
# I have attempted the following
# the following code returns an error message
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# =============================================================================
## for each in coincident.calendar_timestamp:
##     if (coincident[coincident['calendar_timestamp'].between(starting_point, ending_point)]):
##         coincident[coincident['TrucksWorking']] = coincident[coincident['TrucksWorking']] + 1
##     else:
##         coincident[coincident['TrucksWorking']]

# =============================================================================
# I have attempted the following
# a dead end code that I cannot make work
# df = coincident[coincident['calendar_timestamp'].between(starting_point, ending_point)]
# print(df.head(n = 5))
# =============================================================================

# =============================================================================
# I have attempted the following but it will not work
# another dead end code
# for timestamp in coincident_events.calendar:
#     print(coincident_events.calendar.query('coincident_events.calendar >= emergency_events.starting_point and coincident_events.calendar <= emergency_events.ending_point'))
# =============================================================================

print(coincident.head(n = 20))

# =============================================================================
# Step 4: verify the "coincident" table is correct and then analyze the data
# Printing the "coincident" dataframe should look something like:
# =============================================================================
#     StartTime                       TrucksWorking    
# 0  2020-01-01 00:00:28              1
# 1  2020-01-01 00:01:28              1
# 2  2020-01-01 00:02:28              1
# 3  2020-01-01 00:03:28              1
# 4  2020-01-01 00:04:28              2
# 5  2020-01-01 00:05:28              2
# 6  2020-01-01 00:06:28              3
# 7  2020-01-01 00:07:28              3
# 8  2020-01-01 00:08:28              3
# 9  2020-01-01 00:09:28              3
# 10 2020-01-01 00:10:28              3
# etc for a full calendar year of ambulance responses

# =============================================================================
# Step 5: analyze the data looking for patterns of ambulance utilization. TBD
# =============================================================================

Tags: ofthe数据forending时间eventscalendar
1条回答
网友
1楼 · 发布于 2024-07-02 13:41:59

使用您的数据,我找到了以下解决方案。我只使用了2020年的前200分钟,但是你可以通过调整periods=200到每年的分钟数来轻松改变

我使用了以下variablesdf对应于您的一致数据帧。从2020年1月1日起,我每分钟提前生成一次:

import pandas as pd
import datetime
df = pd.DataFrame()
df['time1'] = pd.date_range('2020-01-01 00:00:00', periods=200, freq='min')
df['trucks working'] = 0
print(df)

这给了我一年中所有卡车仍在工作的分钟数=0

                  time1  trucks working
0   2020-01-01 00:00:00               0
1   2020-01-01 00:01:00               0
2   2020-01-01 00:02:00               0
3   2020-01-01 00:03:00               0
4   2020-01-01 00:04:00               0
..                  ...             ...
195 2020-01-01 03:15:00               0
196 2020-01-01 03:16:00               0
197 2020-01-01 03:17:00               0
198 2020-01-01 03:18:00               0
199 2020-01-01 03:19:00               0

使用您的紧急呼叫作为data

data = [['2020-01-01 00:00:28.000','2020-01-01 00:35:28.987']
    , ['2020-01-01 00:02:34.000','2020-01-01 01:05:13.540']
    , ['2020-01-01 00:03:57.000','2020-01-01 01:14:44.537']
    , ['2020-01-01 00:06:17.000','2020-01-01 01:26:52.087']
    , ['2020-01-01 00:13:20.000','2020-01-01 01:17:31.310']
    , ['2020-01-01 00:14:01.000','2020-01-01 01:57:28.343']
    , ['2020-01-01 00:16:11.000','2020-01-01 00:39:34.967']
    , ['2020-01-01 00:22:03.000','2020-01-01 01:46:40.037']
    , ['2020-01-01 00:23:07.000','2020-01-01 00:49:25.890']
    , ['2020-01-01 00:23:19.000','2020-01-01 01:26:39.920']
    , ['2020-01-01 00:30:17.000','2020-01-01 00:38:05.000']] 

我添加列名称并命名生成的数据帧emergency_events

 emergency_events = pd.DataFrame(data, columns = ['StartTime', 'EndTime'])

现在我可以迭代数据帧emergency_events

并增加'trucks working' 一天中的每一分钟 在一行'StartTime''EndTime'之间

for index2, row2 in df.iterrows():
for index, row in emergency_events.iterrows():
    if pd.to_datetime(row['StartTime']) <= pd.to_datetime(row2['time1']) <= pd.to_datetime(row['EndTime']):
        #print(row2['trucks working'])
        #print(row['StartTime'],row2['time1'],row['EndTime'])
        df.at[index2,'trucks working'] += 1

这给了我一个数据帧,其中包含一天中每分钟的卡车数量

time1  trucks working
0   2020-01-01 00:00:00               0
1   2020-01-01 00:01:00               1
2   2020-01-01 00:02:00               1
3   2020-01-01 00:03:00               2
4   2020-01-01 00:04:00               3
..                  ...             ...
195 2020-01-01 03:15:00               0
196 2020-01-01 03:16:00               0
197 2020-01-01 03:17:00               0
198 2020-01-01 03:18:00               0
199 2020-01-01 03:19:00               0

相关问题 更多 >