如何连接两个独立的文本数据文件,并在不同的时间间隔和平均值上对齐数据

2024-10-04 05:21:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个来自两个不同仪器的文本数据文件。一种是电梯,每隔7分钟和8分钟升降一次。我需要将一台仪器的数据与上下位置时间(持续时间为7或8分钟)的数据进行匹配(对齐)。以下是仪器(Picarro)和电梯(AEM)的数据:

一个问题是:Picarro时间是以UTC时间记录的,所以它实际上是下午6点,而不是午夜,而AEM是从午夜开始的。在

位置值表示位置上限(364)或下限(233)。在

仪器(Picarro)

Date            Time            NH3_Raw              
2014-06-24      00:00:01.134    3.3844673297E+000  
2014-06-24      00:00:03.210    3.1585870007E+000 
2014-06-24      00:00:05.293    3.2442662514E+000
2014-06-24      00:00:06.812    3.2442662514E+000
2014-06-24      00:00:08.335    3.1064987772E+000`

电梯(AEM)

^{pr2}$

我希望能够合并这两个单独的文件并输出到一个新的列表中。从这个新列表中,我想对数据执行统计分析、平均值、标准偏差等。但首先,我必须在这些时间范围内调整数据。AEM的时间间隔模式似乎是7、8、8、7分钟,然后重复,因此需要创建一些循环,我认为这远远超出了我的Python技能。我想沿着这个模式创建区间来证实数据。在


Tags: 数据文本列表datetime数据文件记录时间
1条回答
网友
1楼 · 发布于 2024-10-04 05:21:44

以下是一种可以使用的方法:

import re
from datetime import datetime, timedelta
# Custom classes to hold your data.
class ElevatorInterval(object):
    def __init__(self, timestamp, record, loc_strt, loc_cut):
        timestamp = datetime.strptime(timestamp, "%Y-%m-%d %H:%M:%S") 
        self.timestop = self.timestart = timestamp + timedelta(hours=6) #UTC+6
        self.measures = []
        self.location = 'bottom' if float(loc_strt) < 300 else 'top'
class NH3Measure(object):
    def __init__(self, date, tim, nh3_raw):
        self.timestamp = datetime.strptime(date + tim, "%Y-%m-%d%H:%M:%S")
        self.nh3 = nh3_raw
    def __repr__(self):
        return str(self.nh3)
# Read data from file and assign them to elevator measures, and NH3 measures.
ele_intervals, nh3_measures = [], []
with open('aem.txt', 'r') as f:
    for line in f:
        linematch = re.match(r'^"([0-9-]+\s[0-9:]+)(?:\.[0-9])?",([0-9]+),([0-9.]+),([0-9.]+)', line)
        if linematch:
            ele_intervals.append(ElevatorInterval(*linematch.groups()))
            if len(ele_intervals) > 1: # Set timestop for the last elevator interval.
                ele_intervals[-2].timestop = ele_intervals[-1].timestart - timedelta(seconds=22)
del ele_intervals[-1] # Remove last interval as it has no stop time.
with open('pic.txt', 'r') as f:
    for line in f:
        linematch = re.match(r'^([0-9-]+)\s+([0-9:]+)[0-9.]*\s+([0-9E.+]+)', line)
        if linematch: nh3_measures.append(NH3Measure(*linematch.groups()))
# Assign NH3 measures to their proper interval, and output the intervals.
for ele in ele_intervals:
    ele.measures = filter(lambda x: ele.timestart < x.timestamp < ele.timestop, nh3_measures)
    print ele.location, ele.measures

使用示例输入,aem.txt

^{pr2}$

pic.txt

Date            Time            NH3_Raw              
2014-06-24      00:16:39.134    3.3844673297E+000  
2014-06-24      00:16:41.210    3.1585870007E+000 
2014-06-24      00:16:43.293    3.2442662514E+000
2014-06-24      00:24:45.293    4.2442662514E+000
2014-06-24      00:24:47.812    4.4242662514E+000
2014-06-24      00:24:49.335    4.1064987772E+000
2014-06-24      00:31:45.293    3.2442662514E+000
2014-06-24      00:31:47.812    3.2442662514E+000
2014-06-24      00:31:49.335    3.1064987772E+000

打印结果:

bottom [3.3844673297E+000, 3.1585870007E+000, 3.2442662514E+000]
top [4.2442662514E+000, 4.4242662514E+000, 4.1064987772E+000]
bottom [3.2442662514E+000, 3.2442662514E+000, 3.1064987772E+000]

相关问题 更多 >