在填写调查之前，计算15分钟内的平均温度传感器（匹配时间戳+添加新列）

#import raw file import pandas as pd import numpy as np dfSD = pd.read_excel('IEQ_da.xlsx') dfPIT = pd.read_excel('PIT_da.xlsx') #main aim: add after each survey result row in PIT_da.xlsx columns for the average values of the indoor environmental quality parameters in 15/30/60 minutes before submitting the survey #Step 0: set both timestamp and submitdate to right datetime object dfSD['timestamp'] = pd.to_datetime(dfSD['timestamp'], format='%d%b%Y:%H:%M:%S.%f') dfPIT['submitdate'] = pd.to_datetime(dfPIT['submitdate'], format='%d%b%Y:%H:%M:%S.%f') #Step 1: introduce arrays and set to numpy array1 = dfSD[['timestamp']].to_numpy().ravel() array2 = dfPIT[['submitdate']].to_numpy().ravel() data_sensorID = dfSD[['devid']].to_numpy().ravel() survey_sensorID = dfPIT[['PIT5']].to_numpy().ravel()Each survey has a timestamp (=submitdate) and should be matched to the sensor data at that timestamp.

#Step 2: set timestamps to number and define a match from datetime import datetime def timestamps(x) : Timestamps = np.empty(x.size) for i in range(x.size) : date = x[i] dt64 = np.datetime64(date) timestamp = (dt64 - np.datetime64('1970-01-01T00:00:00Z')) / np.timedelta64(1, 's') Timestamps[i] = timestamp return Timestamps array1TS = timestamps(array1) array2TS = timestamps(array2)

#Step 3: define match with conditions: must be same timestamp and must have same sensor ID, by means of a matrix Match = np.empty([array1TS.size, array2TS.size]) for i in range(array1TS.size) : for j in range(array2TS.size): if (data_sensorID[i] == survey_sensorID[j]): if (array1TS[i] == array2TS[j]): Match[i,j] = 1; else: Match[i,j] = 0;

import pandas as pd df = pd.DataFrame({'timestamp' : ['14/04/2020 00:18:00', '14/04/2020 00:18:05', '14/04/2020 00:17:55', '14/04/2020 00:17:50' , '14/04/2020 00:17:40', '14/04/2020 00:17:40', '14/04/2020 00:17:20', '14/04/2020 00:17:20'], 'devid' : ['4', '2', '4', '2', '4' , '2' , '4' , '2'], 'SENtemp' : ['20,2', '18,8', '20,1', '19', '20,2', '18,8', '20,1', '18,9']}) df

1条回答

网友

1楼 · 发布于 2024-09-29 23:26:55

你最初的两个步骤是相当无用的。您可以直接在dfPIT上使用apply来构建新列。最困难的部分是SENtemp是一个字符串列，其小数点为逗号，不能直接转换为浮点。可能代码：

delta = [15, 30, 60]  # delta in minutes

columns = [f'Average{i}' for i in delta]  # column names per delta values

dfPIT[columns] = dfPIT.apply(axis=1, func=lambda x: pd.Series(
    [dfSD.loc[(dfSD['timestamp']>x['submitdate'] - pd.Timedelta(i, 'T'))
              &(dfSD['timestamp']<=x['submitdate']), 'SENtemp']
     .str.replace(',','.').astype('float').mean() for i in delta],
    index=columns))

根据您的示例数据，它提供：

           submitdate PIT5  Average15  Average30  Average60
0 2020-04-14 00:18:00    4  19.614286  19.614286  19.614286
1 2020-04-14 00:18:05    2  19.512500  19.512500  19.512500

相关问题更多 >

编程相关推荐

热门问题

热门文章