我试图在包含调查数据(=PIT_da.xlsx)的excel后面添加新列。在这些列中,应计算并添加填写调查前15、30和60分钟内的平均传感器值(如温度)。传感器数据位于excel文件“IEQ_da.xlsx”(包括时间戳)中
我是这样开始的:
#import raw file
import pandas as pd
import numpy as np
dfSD = pd.read_excel('IEQ_da.xlsx')
dfPIT = pd.read_excel('PIT_da.xlsx')
#main aim: add after each survey result row in PIT_da.xlsx columns for the average values of the indoor environmental quality parameters in 15/30/60 minutes before submitting the survey
#Step 0: set both timestamp and submitdate to right datetime object
dfSD['timestamp'] = pd.to_datetime(dfSD['timestamp'], format='%d%b%Y:%H:%M:%S.%f')
dfPIT['submitdate'] = pd.to_datetime(dfPIT['submitdate'], format='%d%b%Y:%H:%M:%S.%f')
#Step 1: introduce arrays and set to numpy
array1 = dfSD[['timestamp']].to_numpy().ravel()
array2 = dfPIT[['submitdate']].to_numpy().ravel()
data_sensorID = dfSD[['devid']].to_numpy().ravel()
survey_sensorID = dfPIT[['PIT5']].to_numpy().ravel()Each survey has a timestamp (=submitdate) and should be matched to the sensor data at that timestamp.
将时间转换为数字,以便能够计算15分钟/30分钟/60分钟内的差异
#Step 2: set timestamps to number and define a match
from datetime import datetime
def timestamps(x) :
Timestamps = np.empty(x.size)
for i in range(x.size) :
date = x[i]
dt64 = np.datetime64(date)
timestamp = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's')
Timestamps[i] = timestamp
return Timestamps
array1TS = timestamps(array1)
array2TS = timestamps(array2)
接下来,对每个调查提交时间和传感器时间戳(已经四舍五入到最接近的5分钟)进行匹配,包括来自相同传感器设备ID(=devid)和PIT5的情况(调查中询问附近传感器ID的问题)
#Step 3: define match with conditions: must be same timestamp and must have same sensor ID, by means of a matrix
Match = np.empty([array1TS.size, array2TS.size])
for i in range(array1TS.size) :
for j in range(array2TS.size):
if (data_sensorID[i] == survey_sensorID[j]):
if (array1TS[i] == array2TS[j]):
Match[i,j] = 1;
else:
Match[i,j] = 0;
现在,通过此匹配,应在“PIT_da.xlsx”中添加一个新列,该列的平均值为de IEQ_da.xlsx文件中“SENtemp”列的匹配时间戳前15分钟(含温度值)
问题是: 1.如何从“匹配”到在匹配的时间戳之前15分钟从该时间戳中选择所有行。 2.如何计算这些选定行的平均值(忽略空单元格)并将其放置在PIT_da.xlsx中的新列中(该新列应命名为“SENtemp_15”,以表示填写调查前15分钟内的温度)
为了便于参考,使用了一些数据行:
IEQ_da.xlsx
import pandas as pd
df = pd.DataFrame({'timestamp' : ['14/04/2020 00:18:00', '14/04/2020 00:18:05', '14/04/2020 00:17:55', '14/04/2020 00:17:50' , '14/04/2020 00:17:40', '14/04/2020 00:17:40', '14/04/2020 00:17:20', '14/04/2020 00:17:20'], 'devid' : ['4', '2', '4', '2', '4' , '2' , '4' , '2'],
'SENtemp' : ['20,2', '18,8', '20,1', '19', '20,2', '18,8', '20,1', '18,9']})
df
PIT_da.xlsx
import pandas as pd
df = pd.DataFrame({'submitdate' : ['14/04/2020 00:18:00', '14/04/2020 00:18:05'], 'PIT5' : ['4', '2'],
})
df
我希望有人愿意帮助我
你最初的两个步骤是相当无用的。您可以直接在
dfPIT
上使用apply
来构建新列。最困难的部分是SENtemp
是一个字符串列,其小数点为逗号,不能直接转换为浮点。可能代码:根据您的示例数据,它提供:
相关问题 更多 >
编程相关推荐