Python正则表达式捕获重复模式组

#msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0) (Notice the hash present in the middle of the text)

str='Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)' z = re.findall("(#.+?=.+?)(:?#|$)",str) print(z)

2条回答

网友

1楼 · 编辑于 2024-10-03 21:33:25

(:?#|$)是一个捕获组，它匹配一个可选的:，然后是#，或者字符串的结尾。因为re.findall返回所有捕获的子字符串，所以结果是一个元组列表

你需要

re.findall(r'#[^\s=]+=.*?(?=\s*#[^\s=]+=|$)', text)

参见regex demo

正则表达式详细信息

#[^\s=]+-#然后是除空格和=之外的任何1+字符
=-a=字符
.*?-除换行符以外的任何0+字符，尽可能少
(?=\s*#[^\s=]+=|$)-最多（不包括）0+个空格，#，1+个除空格和=以外的字符，然后=或字符串结尾

网友

2楼 · 编辑于 2024-10-03 21:33:25

import re

s = "Oct 23 13:03:03.714012 prod1_xyz(RSVV)[201]: #msgtype=EVENT #server=Web/Dev@server1web #func=LKZ_WriteData ( line 2992 ) #rc=0 #msgid=XYZ0064 #reqid=0 #msg=Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)"

a = re.findall('#(?=[a-zA-Z]+=).+?=.*?(?= #[a-zA-Z]+=|$)', s)

result = [item.split('=') for item in a]

print(result)

提供：

[['#msgtype', 'EVENT'], ['#server', 'Web/Dev@server1web'], ['#func', 'LKZ_WriteData ( line 2992 )'], ['#rc', '0'], ['#msgid', 'XYZ0064'], ['#reqid', '0'], ['#msg', 'Web Activity end (section 200, # SysD 1, Files 222, Bytes 343422089928, Errors 0, Aborted Files 0, Busy Files 0)']]

相关问题更多 >

编程相关推荐

热门问题

热门文章