如何在Python中解析这个自定义日志文件

2015-05-22 16:46:46,985 - __main__ - INFO - Starting to Wait for Files 2015-05-22 16:46:56,645 - __main__ - INFO - Starting: Attempt 1 Checking for New Files from gs://folder/folder/ 2015-05-22 16:47:46,488 - __main__ - INFO - Success: Downloading the Files from Cloud Storage: Return Code - 0 and FileCount 1 2015-05-22 16:48:48,180 - __main__ - ERROR - Failed: Waiting for files the Files from Cloud Storage: gs://folder/folder/ Traceback (most recent call last): File "<ipython-input-16-132cda1c011d>", line 10, in <module> if numFilesDownloaded == 0: NameError: name 'numFilesDownloaded' is not defined 2015-05-22 16:49:17,918 - __main__ - INFO - Starting to Wait for Files 2015-05-22 16:49:32,160 - __main__ - INFO - Starting: Attempt 1 Checking for New Files from gs://folder/folder/ 2015-05-22 16:49:39,329 - __main__ - INFO - Success: Downloading the Files from Cloud Storage: Return Code - 0 and FileCount 1 2015-05-22 16:53:30,706 - __main__ - INFO - Starting to Wait for Files

{ 'Date': '2015-05-22 16:46:46,985', 'Type': 'INFO', 'Message':'Starting to Wait for Files' } ... { 'Date': '2015-05-22 16:48:48,180', 'Type': 'ERROR', 'Message':'Failed: Waiting for files the Files from Cloud Storage: gs://folder/anotherfolder/ Traceback (most recent call last): File "<ipython-input-16-132cda1c011d>", line 10, in <module> if numFilesDownloaded == 0: NameError: name 'numFilesDownloaded' is not defined ' }

3条回答

网友

1楼 · 编辑于 2024-05-08 07:39:33

您可以使用groups直接从regex中获取要查找的字段。你甚至可以说出他们的名字：

>>> import re
>>> date_re = re.compile('(?P<a_year>\d{2,4})-(?P<a_month>\d{2})-(?P<a_day>\d{2}) (?P<an_hour>\d{2}):(?P<a_minute>\d{2}):(?P<a_second>\d{2}[.\d]*)')
>>> found = date_re.match('2016-02-29 12:34:56.789')
>>> if found is not None:
...     print found.groupdict()
... 
{'a_year': '2016', 'a_second': '56.789', 'a_day': '29', 'a_minute': '34', 'an_hour': '12', 'a_month': '02'}
>>> found.groupdict()['a_month']
'02'

然后创建一个日期类，其中构造函数的kwarg与组名匹配。使用一点**魔法直接从regex groupdict创建对象的实例，您正在使用gas烹饪。在构造器中，您可以确定2016年是否是闰年，2月29日是否退出。

-轻轨

网友

2楼 · 编辑于 2024-05-08 07:39:33

使用@Joran Beasley的答案，我提出了以下解决方案，似乎奏效了：

要点：

我的日志文件总是遵循相同的结构：{Date}-{Type}- {消息}所以我使用字符串切片和拆分来将项目分解需要他们。例如{Date}总是23个字符，而我想要前19个字符。
使用line.startswith（“2015”）是疯狂的，因为日期最终会改变，所以创建了一个新函数，它使用一些正则表达式来匹配我期望的日期格式。再一次，我的日志日期遵循一个特定的模式，这样我可以得到特定的。
文件被读入第一个函数“generatedits（）”，然后调用“matchDate（）”函数，查看正在处理的行是否与我正在查找的{Date}格式匹配。
每次找到有效的{Date}格式时都会创建一个新的dict，并在遇到下一个有效的{Date}之前处理所有内容。

函数来拆分日志文件

def generateDicts(log_fh):
    currentDict = {}
    for line in log_fh:
        if line.startswith(matchDate(line)):
            if currentDict:
                yield currentDict
            currentDict = {"date":line.split("__")[0][:19],"type":line.split("-",5)[3],"text":line.split("-",5)[-1]}
        else:
            currentDict["text"] += line
    yield currentDict

with open("/Users/stevenlevey/Documents/out_folder/out_loyaltybox/log_CardsReport_20150522164636.logs") as f:
    listNew= list(generateDicts(f))

函数查看正在处理的行是否以与我正在查找的格式匹配的{Date}开头

    def matchDate(line):
        matchThis = ""
        matched = re.match(r'\d\d\d\d-\d\d-\d\d\ \d\d:\d\d:\d\d',line)
        if matched:
            #matches a date and adds it to matchThis            
            matchThis = matched.group() 
        else:
            matchThis = "NONE"
        return matchThis

网友

3楼 · 编辑于 2024-05-08 07:39:33

创建生成器（当前生成器折弯上的Im）

def generateDicts(log_fh):
    currentDict = {}
    for line in log_fh:
        if line.startswith("2015"): #you might want a better check here
           if currentDict:
              yield currentDict
           currentDict = {"date":line.split("-")[0],"type":line.split("-")[2],"text":line.split("-")[-1]}
       else:
          currentDict["text"] += line
    yield currentDict

 with open("logfile.txt") as f:
    print list(generateDicts(f))

可能有一些小错误。。。我并没有真的运行这个

函数来拆分日志文件

函数查看正在处理的行是否以与我正在查找的格式匹配的{Date}开头

相关问题更多 >

编程相关推荐

热门问题

热门文章