在windows中使用python根据时间戳将多个日志文件按排序顺序合并为单个日志文件?

2024-10-01 15:38:11 发布

您现在位置:Python中文网/ 问答频道 /正文

有人能帮我解决以下问题吗

输入文件1:

abc.exe TryEndHand [520] 30-4-2020 8:8:52.786  [3636] Handshake value
Executing end handlier

abc.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:51.583  [3760] Create the general message 
Error Occured!! 30-4-2020 8:9:29.93  [2932] WARNING cannot remove qid
def.exe SharedCreateNamed [488] 30-5-2020 8:8:51.584  [3636] Create the general different message 

输入文件2:

abc1.exe TryEndHand [520] 30-5-2020 8:8:51.786  [3636] Handshake value 
abc1.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:52.58  [3760] Create the general message 
def1.exe SharedCreateNamed [488] 30-5-2020 8:8:51.53  [3636] Create the general different message

同样地

输入文件N: ...........

输出文件(1,2,---N):

abc.exe TryEndHand [520] 30-4-2020 8:8:52.786  [3636] Handshake value 
Executing end handlier
Error Occured!! 30-4-2020 8:9:29.93  [2932] WARNING cannot remove qid
abc.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:51.583  [3760] Create the general message 
abc1.exe TryEndHand [520] 30-5-2020 8:8:51.786  [3636] Handshake value
def1.exe SharedCreateNamed [488] 30-5-2020 8:8:51.53  [3636] Create the general different message 
def.exe SharedCreateNamed [488] 30-5-2020 8:8:51.584  [3636] Create the general different message 
abc1.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:52.58  [3760] Create the general message 

请帮助根据输出文件中以粗体字母显示的日期和时间戳生成排序日志。 请注意,输入文件包含空格和不带时间戳的语句,还包含错误案例。


Tags: 文件themessagevaluecreateexegeneralabc
2条回答

使用datefinder提取日期时间,使用pandas按日期时间排序

import os
# pip install datefinder
import datefinder
# pip install pandas
import pandas as pd

PATH = './input_files/'
# a dictonary with key - datetime and values - substrings
df_dict = dict()
for in_file in os.listdir(PATH):
    # read input files one by one
    input_file = open(os.path.join(PATH,in_file)).read()
    # to store prev datetime to handle lines with no datetime
    prev_dt = 0
    for line in input_file.splitlines():
        # parse lines one by one
        if line.strip():
            pre, dt, post, raw_dt = '', 0, '', 0
            # using datefinder to extract datetime
            for match in datefinder.find_dates(line, index=True, source=True):
                # change valid year conditions accoring to use_case
                if match[0].year == 2020:
                    dt, pre, raw_dt, post  = match[1], line[:match[2][0]], match[1], line[match[2][1]:]
            if dt:
                prev_dt = dt
                df_dict[dt] = [pre, raw_dt, post]
            else:
                df_dict[prev_dt].append(line)
df = pd.DataFrame.from_dict(dt_dict, orient='index')
df.index = pd.to_datetime(df.index)
next_lines = df.pop(3)
df = pd.concat([df,next_lines]).dropna(how='all').fillna('').sort_index()

output =df[df.columns].apply(lambda x: ' '.join(x), axis=1)
output.to_csv('output.txt', header=False, index=None)

输出:

abc.exe TryEndHand [520] 30-4-2020 8:8:52.786 [3636] Handshake value
Executing end handlier  
Error Occured!! 30-4-2020 8:9:29.93 [2932] WARNING cannot remove qid
def1.exe SharedCreateNamed [488] 30-5-2020 8:8:51.53 [3636] Create the general different message
abc.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:51.583 [3760] Create the general message 
def.exe SharedCreateNamed [488] 30-5-2020 8:8:51.584 [3636] Create the general different message
abc1.exe TryEndHand [520] 30-5-2020 8:8:51.786 [3636] Handshake value 
abc1.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:52.58 [3760] Create the general message

对于两个文件中的每一行,使用正则表达式从当前行提取时间戳,将时间戳转换为datetime.datetime对象,并对datetime.datetime对象的集合进行排序:

def main():

    import re
    from datetime import datetime

    with open("log1.txt", "r") as log_1, open("log2.txt", "r") as log_2:
        all_lines = log_1.read().splitlines() + log_2.read().splitlines()

    for line in sorted(all_lines, key=lambda s: datetime.strptime(re.search("\\] ([^\\[]+) \\[", s).group(1), "%d-%m-%Y %H:%M:%S.%f")):
        print(line)

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

abc1.exe TryEndHand [520] 30-4-2020 8:8:51.786 [3636] Handshake value from driver = 1
abc.exe TryEndHand [520] 30-4-2020 8:8:52.786 [3636] Handshake value from driver = 1
def1.exe SharedCreateNamed [488] 30-5-2020 8:8:51.53 [3636] Create the general different message
abc.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:51.583 [3760] Create the general message
def.exe SharedCreateNamed [488] 30-5-2020 8:8:51.584 [3636] Create the general different message
abc1.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:52.58 [3760] Create the general message
>>> 

不幸的是,您的数据不正确。第二个日志文件的第一行包含一个不存在的日期—2020年4月31日。我发布的代码只起作用,因为我把那一行改成了第30行

编辑-对于多个文件,可以使用contextlib.ExitStack作为上下文管理器:

def main():

    from pathlib import Path
    from contextlib import ExitStack

    with ExitStack() as stack:
        def get_line():
            for file in (stack.enter_context(path.open()) for path in Path("logs/").glob("*.txt")):
                for line in file.read().splitlines():
                    yield line
        all_lines = list(get_line())
    print(all_lines)

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

编辑-感谢新的日志文件。把所有东西放在一起:

def main():

    from pathlib import Path
    from contextlib import ExitStack
    import re
    from datetime import datetime
    from itertools import groupby

    with ExitStack() as stack:
        def get_line():
            for file in (stack.enter_context(path.open()) for path in Path("logs/").glob("*.txt")):
                for line in filter(None, file.read().splitlines()):
                    yield line
        all_lines = list(get_line())

    pattern = "(?P<timestamp>{}-{}-{} {}:{}:{}\\.{})".format(*["\\d+"] * 7)
    strptime_fmt = "%d-%m-%Y %H:%M:%S.%f"

    def get_group():
        group = []
        for line in all_lines:
            match = re.search(pattern, line)
            if group:
                if match is None:
                    group.append(line)
                else:
                    yield group
                    group = [line]
            else:
                if match is not None:
                    group.append(line)
        yield group

    for group in sorted(list(get_group()), key=lambda g: datetime.strptime(re.search(pattern, g[0]).group("timestamp"), strptime_fmt)):
        for line in group:
            print(line)
    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

abc.exe TryEndHand [520] 30-4-2020 8:8:52.786  [3636] Handshake value
Executing end handlier
Error Occured!! 30-4-2020 8:9:29.93  [2932] WARNING cannot remove qid
def1.exe SharedCreateNamed [488] 30-5-2020 8:8:51.53  [3636] Create the general different message
abc.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:51.583  [3760] Create the general message 
def.exe SharedCreateNamed [488] 30-5-2020 8:8:51.584  [3636] Create the general different message
abc1.exe TryEndHand [520] 30-5-2020 8:8:51.786  [3636] Handshake value 
abc1.exe QueueSharedCreateNamed [488] 30-5-2020 8:8:52.58  [3760] Create the general message 
>>> 

相关问题 更多 >

    热门问题