如何解析包含不确定数据模式的日志文件？

o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd

o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 # Should be indication of request i.e., line beginning with o, followed some data o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd

o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd o 123456789.000 10.10.10.10 3 30 10 - n A-123456 1452830400 1 1452 n C-73652 1452830400 1 23154 n B-967845 1452830400 1 37451 n G-809573 1452830400 1 92673 # No line present i,e., (o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd)

reg_ex1 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+-" reg_ex2 = "o\s+\d+(\.\d+)?\d+\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s+\d+\s+\d+\s+\d+\s+[a-zA-Z0-9_]+" with open(""some_file.log, 'r') as content_file: content = content_file.read() pattern1 = re.compile(reg_ex1) begin_lines = len(pattern1.findall(content)) pattern2 = re.compile(reg_ex2) end_lines = len(pattern2.findall(content)) if begin_lines == end_lines: print "File has successful requests captured" else: print "File has un-successful requests captured" # If wrong-data generated is not for latest request, can be ignored. # If wrong-data generated is for latest request, it should be caught and highlighted. May be not a good idea though, please let me know.

o 123456789.000 10.10.10.10 3 30 10 001- n A-123456===123 1452830400 1 14521 n C-73652 1452830400 1 231541 n B-967845 1452830400 1 37451 n G-809573==123 1452830400 1 926731 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd o 123456789.000 10.10.10.10 3 30 10 002- n A-123456===456 1452830400 1 14522 n C-73652 1452830400 1 231542 n B-967845 1452830400 1 37452 n G-809573===456 1452830400 1 926732 o 123456789.000 10.10.10.10 3 30 10 003- n A-123456===789 1452830400 1 14523 n C-73652 1452830400 1 231543 n B-967845 1452830400 1 374513 n G-809573===789 1452830400 1 926733 o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd

3条回答

网友

1楼 · 编辑于 2024-10-04 05:33:15

^n\s.+[\n\r]+o\s.+[\n\r]+n\s.+|^n\s.+[\n\r]+n\s.+[\n\r]+n\s.+[\n\r]+n\s.+[\n\r]+(?!o)|^o\s.+[\n\r]+o\s.+[\n\r]+o\s.+

网友

2楼 · 编辑于 2024-10-04 05:33:15

我肯定会推荐方法1，为什么。。？这样，我们就可以灵活地读取/迭代每一行。。你知道吗

with open('file.txt', r) as fp:
  line = fp.readline()
  print(type(line))  #string

  #do anything with line(string)
  #1. split_list= fp.split()   list of values separated by space
  #2. Check type of each element: 
   # split_list[0].isalpha(), 
   # split_list[0].isalpha(),
   # split_list[0].isdigit(),
   # split_list[0].isspace() like so, and then do required adding to final dict/list..

一定要尝试：抓住，每一步。。因为日志文件是不可预测的。你知道吗

网友

3楼 · 编辑于 2024-10-04 05:33:15

为了检查文件是good还是bad，我们将使用文件的第一行和最后一行，考虑到：

如果第一行不是以o开头，则文件是错误的
如果最后一行没有以o结尾，则文件是坏的
如果第一行和最后一行以o开头，则文件是好的

你知道吗列表.txt地址：

o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1  37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1  37451
n G-809573 1452830400 1 92673
# Should be indication of request i.e., line beginning with o, followed some data
o 123456789.000 10.10.10.10 3 30 10 -
n A-123456 1452830400 1 1452
n C-73652 1452830400 1 23154
n B-967845 1452830400 1  37451
n G-809573 1452830400 1 92673
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfd

因此：

logFile = "list.txt"    
with open(logFile) as f:
    content = f.readlines()

# you may also want to remove empty lines
content = [l.strip() for l in content if l.strip()]

for line in content:
    if line.startswith("o"):  # check if the first line starts with o
        if str(content[-1]).strip("[']").split()[0] == 'o': # check if last line starts with o
            print("File is good.")
        else:
            print("File is bad.")
        break
    else:                    # end if the first line does not start with o
        print("File is bad.")
        break

编辑：

要获得有效对o之间的所有响应：

你知道吗列表.txt地址：

o 123456789.000 10.10.10.10 3 30 10 001-
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1  37451
n G-809573 1452830400 1 926731
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 002-
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1  37452
n G-809573 1452830400 1 926732
o 123456789.000 10.10.10.10 3 30 10 some_random_text_alphanumeric_jdjfdjfdfhdkjfhdkhfdhfdfdfhkdhfkdjfdkjfkdfdkfdkjnc maxbgrsdfuyhlwkjdnkshbvhsgdvsdsjdbskdhskdjoihe73njndedejdoekekdednd
o 123456789.000 10.10.10.10 3 30 10 003-
n A-123456 1452830400 1 14523
n C-73652 1452830400 1 231543
n B-967845 1452830400 1  374513
n G-809573 1452830400 1 926733

因此：

import re
def GetTheResponses(infile):
     with open(infile) as fp:
         red = fp.read()
         for result in re.findall('o (.*?)o ', red, re.S):
             print(result)

GetTheResponses('list.txt')

输出：

123456789.000 10.10.10.10 3 30 10 001-
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1  37451
n G-809573 1452830400 1 926731

123456789.000 10.10.10.10 3 30 10 002-
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1  37452
n G-809573 1452830400 1 926732

编辑2（为了更好的可读性）：

count = 1
for result in re.findall('o (.*?)o ', red, re.S):
    print("Response Packet: {}".format(count))
    print("\n".join(result.split("\n")[1:]))
    count +=1

输出：

Response Packet: 1
n A-123456 1452830400 1 14521
n C-73652 1452830400 1 231541
n B-967845 1452830400 1  37451
n G-809573 1452830400 1 926731

Response Packet: 2
n A-123456 1452830400 1 14522
n C-73652 1452830400 1 231542
n B-967845 1452830400 1  37452
n G-809573 1452830400 1 926732

相关问题更多 >

编程相关推荐

热门问题

热门文章