基于时间戳将txt文件数据分割成24小时块

2024-09-28 17:03:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个txt文件,格式如下:

Event A       15MAR18 103000       15MAR18 103758    
Event A       16MAR18 120518       16MAR18 121308  
Event B       16MAR18 121203       16MAR18 124543   
Event B       16MAR18 134443       16MAR18 141823 
Event B       16MAR18 151733       16MAR18 155103   
Event B       17MAR18 165013       17MAR18 172343       
Event B       17MAR18 182253       17MAR18 185623     
Event B       17MAR18 195533       17MAR18 202903 
Event A       17MAR18 203738       17MAR18 204028     
Event B       18MAR18 212813       18MAR18 220143     
Event A       18MAR18 221058       18MAR18 222338      
Event B       18MAR18 230103       18MAR18 233423    
Event A       19MAR18 234728       19MAR18 000048       
Event B       20MAR18 003343       20MAR18 010703   
Event A       20MAR18 012508       20MAR18 013418      
Event B       21MAR18 020623       21MAR18 023943       
Event B       21MAR18 033903       21MAR18 041223      
Event B       21MAR18 051143       21MAR18 054503     
Event B       21MAR18 064433       21MAR18 071743     
Event A       22MAR18 074058       22MAR18 075008   
Event B       22MAR18 081713       22MAR18 085023      
Event A       23MAR18 091438       23MAR18 092738     
Event B       23MAR18 094953       23MAR18 102303      
Event A       23MAR18 105148       23MAR18 110418  

我正在尝试根据中间列的24小时时间差来分隔文件。你知道吗

例如,第一行15mar18103000将是它自己的单独列表

第二行将是一个不同的列表,因为timedelta是>;24小时。将从16MAR18 120518到16MAR18 151733分组。等。。。你知道吗

我的尝试如下:

List_Segment_1 = []

with open('file.txt', 'r') as input_file:
     input_file = input_file.readlines()

startTime = datetime.strptime(input_file[0][15:29], '%d%b%y %H%M%S')
endTime = startTime + timedelta(hours=24)


for line in input_file:
     dates= datetime.strptime(line[15:29], '%d%b%y %H%M%S')

     if startTime < dates < endTime:
           List_Segment_1.append(line)

我不知道如何做它的其余线路。。。只有第一个“段”。。。在真正的txt文件中有数百行。。。。也许有更好的方法用字典来分割数据?你知道吗

谢谢你的帮助。理想情况下没有熊猫或任何扩展库

输出应如下所示:

Event A       15MAR18 103000       15MAR18 103758      Segment1
Event A       16MAR18 120518       16MAR18 121308      Segment2 
Event B       16MAR18 121203       16MAR18 124543      Segment2
Event B       16MAR18 134443       16MAR18 141823      Segment2
Event B       16MAR18 151733       16MAR18 155103      Segment2
Event B       17MAR18 165013       17MAR18 172343      Segment3
Event B       17MAR18 182253       17MAR18 185623      Segment3
Event B       17MAR18 195533       17MAR18 202903      Segment3
Event A       17MAR18 203738       17MAR18 204028      Segment3
Event B       18MAR18 212813       18MAR18 220143      Segment4
Event A       18MAR18 221058       18MAR18 222338      Segment4
Event B       18MAR18 230103       18MAR18 233423      Segment4
Event A       19MAR18 234728       19MAR18 000048      Segment5
Event B       20MAR18 003343       20MAR18 010703      Segment5
Event A       20MAR18 012508       20MAR18 013418      Segment5
Event B       21MAR18 020623       21MAR18 023943      Segment6 
Event B       21MAR18 033903       21MAR18 041223      Segment6
Event B       21MAR18 051143       21MAR18 054503      Segment6
Event B       21MAR18 064433       21MAR18 071743      Segment6
Event A       22MAR18 074058       22MAR18 075008      Segment6
Event B       22MAR18 081713       22MAR18 085023      Segment7
Event A       23MAR18 091438       23MAR18 092738      Segment8
Event B       23MAR18 094953       23MAR18 102303      Segment8
Event A       23MAR18 105148       23MAR18 110418      Segment8

Tags: 文件txtevent列表inputlinefile小时
4条回答

这是对您的问题的幼稚实现,您应该根据需要进行修改:

from datetime import datetime, timedelta

with open('file.txt', 'r') as input_file:
    lines = input_file.readlines()

base_time = datetime.strptime(lines[0][14:28], '%d%b%y %H%M%S')
end_time = base_time + timedelta(hours=24)
segment = 1

for line in lines:
    date = datetime.strptime(line[14:28], '%d%b%y %H%M%S')

    if base_time <= date < end_time:
        pass
    else:
        segment += 1
        base_time = date
        end_time = date + timedelta(hours=24)

    print(line.strip()  + '\tSegment {}'.format(segment))

此代码段输出:

Event A       15MAR18 103000       15MAR18 103758       Segment 1
Event A       16MAR18 120518       16MAR18 121308       Segment 2
Event B       16MAR18 121203       16MAR18 124543       Segment 2
Event B       16MAR18 134443       16MAR18 141823       Segment 2
Event B       16MAR18 151733       16MAR18 155103       Segment 2
Event B       17MAR18 165013       17MAR18 172343       Segment 3
Event B       17MAR18 182253       17MAR18 185623       Segment 3
Event B       17MAR18 195533       17MAR18 202903       Segment 3
Event A       17MAR18 203738       17MAR18 204028       Segment 3
Event B       18MAR18 212813       18MAR18 220143       Segment 4
Event A       18MAR18 221058       18MAR18 222338       Segment 4
Event B       18MAR18 230103       18MAR18 233423       Segment 4
Event A       19MAR18 234728       19MAR18 000048       Segment 5
Event B       20MAR18 003343       20MAR18 010703       Segment 5
Event A       20MAR18 012508       20MAR18 013418       Segment 5
Event B       21MAR18 020623       21MAR18 023943       Segment 6
Event B       21MAR18 033903       21MAR18 041223       Segment 6
Event B       21MAR18 051143       21MAR18 054503       Segment 6
Event B       21MAR18 064433       21MAR18 071743       Segment 6
Event A       22MAR18 074058       22MAR18 075008       Segment 7
Event B       22MAR18 081713       22MAR18 085023       Segment 7
Event A       23MAR18 091438       23MAR18 092738       Segment 8
Event B       23MAR18 094953       23MAR18 102303       Segment 8
Event A       23MAR18 105148       23MAR18 110418       Segment 8

假设日期是01-31(不是1-31),我编写了一个基于字符串切片的解决方案。但是你也可以用datetime和这个逻辑。你知道吗

from pprint import pprint

with open('file.txt', 'r') as input_file:
    input_file = input_file.readlines()

previous_day = 15 # first line of the file
segments = []
day_data = []
for line in input_file:
    current_day = int(line[14:16])
    if current_day > previous_day:
        # new day
        segments.append(day_data) # append before starting new list
        day_data = []
        day_data.append(str(line))
    else:
        day_data.append(str(line))

pprint(segments)

相当老式的代码,但工作。输出为字典。你知道吗

import datetime

mydict = {}
l_num = 1
with open('file.txt', 'r') as input_file:
    input_file = input_file.readlines()


for i in range(len(input_file)):
    if i == 0:
        mydict['Segment ' + str(l_num)] = [input_file[i]]
    else:
        prevDate = datetime.datetime.strptime(input_file[i-1].split('       ')[1], '%d%b%y %H%M%S')
        Date = datetime.datetime.strptime(input_file[i].split('       ')[1], '%d%b%y %H%M%S')
        if Date - prevDate > datetime.timedelta(hours = 24):
            l_num += 1
            mydict['Segment ' + str(l_num)] = []
            mydict['Segment ' + str(l_num)].append(input_file[i])
        else:
            mydict['Segment ' + str(l_num)].append(input_file[i])

刚注意到。我在用Python2。我不确定它是否能在Python3中正常工作。不过,我希望是这样。你知道吗

这是对您的问题的幼稚实现,您应该根据需要进行修改:

from datetime import datetime, timedelta

with open('file.txt', 'r') as input_file:
    lines = input_file.readlines()

base_time = datetime.strptime(lines[0][14:28], '%d%b%y %H%M%S')
end_time = base_time + timedelta(hours=24)
segment = 1

for line in lines:
    date = datetime.strptime(line[14:28], '%d%b%y %H%M%S')

    if base_time <= date < end_time:
        pass
    else:
        segment += 1
        base_time = date
        end_time = date + timedelta(hours=24)

    print(line.strip()  + '\tSegment {}'.format(segment))

此代码段输出:

Event A       15MAR18 103000       15MAR18 103758       Segment 1
Event A       16MAR18 120518       16MAR18 121308       Segment 2
Event B       16MAR18 121203       16MAR18 124543       Segment 2
Event B       16MAR18 134443       16MAR18 141823       Segment 2
Event B       16MAR18 151733       16MAR18 155103       Segment 2
Event B       17MAR18 165013       17MAR18 172343       Segment 3
Event B       17MAR18 182253       17MAR18 185623       Segment 3
Event B       17MAR18 195533       17MAR18 202903       Segment 3
Event A       17MAR18 203738       17MAR18 204028       Segment 3
Event B       18MAR18 212813       18MAR18 220143       Segment 4
Event A       18MAR18 221058       18MAR18 222338       Segment 4
Event B       18MAR18 230103       18MAR18 233423       Segment 4
Event A       19MAR18 234728       19MAR18 000048       Segment 5
Event B       20MAR18 003343       20MAR18 010703       Segment 5
Event A       20MAR18 012508       20MAR18 013418       Segment 5
Event B       21MAR18 020623       21MAR18 023943       Segment 6
Event B       21MAR18 033903       21MAR18 041223       Segment 6
Event B       21MAR18 051143       21MAR18 054503       Segment 6
Event B       21MAR18 064433       21MAR18 071743       Segment 6
Event A       22MAR18 074058       22MAR18 075008       Segment 7
Event B       22MAR18 081713       22MAR18 085023       Segment 7
Event A       23MAR18 091438       23MAR18 092738       Segment 8
Event B       23MAR18 094953       23MAR18 102303       Segment 8
Event A       23MAR18 105148       23MAR18 110418       Segment 8

相关问题 更多 >