如何在Python中解析这个自定义日志文件问题的回答

如何在Python中解析这个自定义日志文件

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我使用Python日志记录在处理时生成日志文件，并尝试将这些日志文件读取到list/dict中，然后将其转换为JSON并加载到nosql数据库中进行处理。 将使用以下格式生成文件。 <pre><code>2015-05-22 16:46:46,985 - __main__ - INFO - Starting to Wait for Files 2015-05-22 16:46:56,645 - __main__ - INFO - Starting: Attempt 1 Checking for New Files from gs://folder/folder/ 2015-05-22 16:47:46,488 - __main__ - INFO - Success: Downloading the Files from Cloud Storage: Return Code - 0 and FileCount 1 2015-05-22 16:48:48,180 - __main__ - ERROR - Failed: Waiting for files the Files from Cloud Storage: gs://folder/folder/ Traceback (most recent call last): File "<ipython-input-16-132cda1c011d>", line 10, in <module> if numFilesDownloaded == 0: NameError: name 'numFilesDownloaded' is not defined 2015-05-22 16:49:17,918 - __main__ - INFO - Starting to Wait for Files 2015-05-22 16:49:32,160 - __main__ - INFO - Starting: Attempt 1 Checking for New Files from gs://folder/folder/ 2015-05-22 16:49:39,329 - __main__ - INFO - Success: Downloading the Files from Cloud Storage: Return Code - 0 and FileCount 1 2015-05-22 16:53:30,706 - __main__ - INFO - Starting to Wait for Files </code></pre> 注意：在您看到的每个新日期之前，实际上都有休息时间，但这里似乎不能表示它。 基本上，我试图读入这个文本文件并生成一个json对象，如下所示： <pre><code>{ 'Date': '2015-05-22 16:46:46,985', 'Type': 'INFO', 'Message':'Starting to Wait for Files' } ... { 'Date': '2015-05-22 16:48:48,180', 'Type': 'ERROR', 'Message':'Failed: Waiting for files the Files from Cloud Storage: gs://folder/anotherfolder/ Traceback (most recent call last): File "<ipython-input-16-132cda1c011d>", line 10, in <module> if numFilesDownloaded == 0: NameError: name 'numFilesDownloaded' is not defined ' } </code></pre> 我遇到的问题： 我可以将每一行添加到一个列表或dict等，但错误消息有时会跨越多行，因此我最终会错误地将其拆分。 已尝试： 我试图使用下面这样的代码只在有效日期分割行，但似乎无法得到跨越多行的错误消息。我还尝试了正则表达式，认为这是一个可能的解决方案，但似乎找不到合适的正则表达式使用…不知道它是如何工作的，所以尝试了一堆复制粘贴，但没有任何成功。 <pre><code>with open(filename,'r') as f: for key,group in it.groupby(f,lambda line: line.startswith('2015')): if key: for line in group: listNew.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(line) </code></pre> 尝试了一些疯狂的正则表达式，但也没有运气： <pre><code>logList = re.split(r'(19|20)\d\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])', fileData) </code></pre> 如果能帮忙…谢谢 编辑： 在下面发布了一个解决方案，供其他人在同一件事情上挣扎。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何在Python中解析这个自定义日志文件

1 个回答

相关Python问题