我有两个场景可以从具有如下结构的日志文件中获取一些信息:
proc format;
2018-04-12T07:45:52,430 INFO [00000009] :t707982 - 26
2018-04-12T07:45:52,430 INFO [00000009] :t707982 - 27
2018-04-12T07:45:52,433 INFO [00000009] :t707982 - 35 '0010','0019'="08"
2018-04-12T07:45:52,434 INFO [00000009] :t707982 - 36 '0005','0007','0011','0013'="09"
NOTE: There were 95219365 observations read from the data set WORK.TESTE1.
2018-04-12T07:55:41,536 INFO [00000018] :t707982 - NOTE: The data set WORK.TESTE1 has 95219365 observations and 9 variables.
2018-04-12T07:55:41,537 INFO [00000018] :t707982 - NOTE: Compressing data set WORK.TESTE1 decreased size by 34.04 percent.
2018-04-12T07:55:41,538 INFO [00000018] :t707982 - Compressed is 92230 pages; un-compressed would require 139823 pages.
2018-04-12T07:55:42,230 INFO [00000018] :t707982 - NOTE: PROCEDURE FORMAT used (Total process time):
2018-04-12T07:55:42,231 INFO [00000018] :t707982 - real time 2:07.03
2018-04-12T07:55:42,231 INFO [00000018] :t707982 - user cpu time 1:56.98
2018-04-12T07:55:42,231 INFO [00000018] :t707982 - system cpu time 39.22 seconds
2018-04-12T07:55:42,231 INFO [00000018] :t707982 - memory 3159502.32k
proc format;
2018-04-12T08:45:52,430 INFO [00000009] :t707982 - 26
2018-04-12T08:45:52,434 INFO [00000009] :t707982 - 36 '0005','0007','0011','0013'="09"
NOTE: There were 95219365 observations read from the data set WORK.TESTE2.
2018-04-12T08:55:41,536 INFO [00000018] :t707982 - NOTE: The data set WORK.TESTE2 has 95219365 observations and 9 variables.
2018-04-12T08:55:41,537 INFO [00000018] :t707982 - NOTE: Compressing data set WORK.TESTE2 decreased size by 34.04 percent.
2018-04-12T08:55:41,538 INFO [00000018] :t707982 - Compressed is 92230 pages; un-compressed would require 139823 pages.
2018-04-12T08:55:42,230 INFO [00000018] :t707982 - NOTE: PROCEDURE FORMAT used (Total process time):
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - real time 2:07.03
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - user cpu time 1:56.98
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - system cpu time 39.22 seconds
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - memory 3159502.32k
1)提取过程{format}和注意:过程{format}之间的所有信息
2)如果第一个过程{format}没有注释:过程{format},它需要在发现另一个过程时停止捕获,并且不要从第二个过程{format}返回注释:过程{format},如本例所示:
proc format;
2018-04-12T07:45:52,430 INFO [00000009] :t707982 - 26
2018-04-12T07:45:52,430 INFO [00000009] :t707982 - 27
2018-04-12T07:45:52,433 INFO [00000009] :t707982 - 35 '0010','0019'="08"
2018-04-12T07:45:52,434 INFO [00000009] :t707982 - 36 '0005','0007','0011','0013'="09"
NOTE: There were 95219365 observations read from the data set WORK.TESTE1.
2018-04-12T07:55:41,536 INFO [00000018] :t707982 - NOTE: The data set WORK.TESTE1 has 95219365 observations and 9 variables.
2018-04-12T07:55:41,537 INFO [00000018] :t707982 - NOTE: Compressing data set WORK.TESTE1 decreased size by 34.04 percent.
2018-04-12T07:55:41,538 INFO [00000018] :t707982 - Compressed is 92230 pages; un-compressed would require 139823 pages.
proc format;
2018-04-12T08:45:52,430 INFO [00000009] :t707982 - 26
2018-04-12T08:45:52,434 INFO [00000009] :t707982 - 36 '0005','0007','0011','0013'="09"
NOTE: There were 95219365 observations read from the data set WORK.TESTE2.
2018-04-12T08:55:41,536 INFO [00000018] :t707982 - NOTE: The data set WORK.TESTE2 has 95219365 observations and 9 variables.
2018-04-12T08:55:41,537 INFO [00000018] :t707982 - NOTE: Compressing data set WORK.TESTE2 decreased size by 34.04 percent.
2018-04-12T08:55:41,538 INFO [00000018] :t707982 - Compressed is 92230 pages; un-compressed would require 139823 pages.
2018-04-12T08:55:42,230 INFO [00000018] :t707982 - NOTE: PROCEDURE FORMAT used (Total process time):
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - real time 2:07.03
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - user cpu time 1:56.98
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - system cpu time 39.22 seconds
2018-04-12T08:55:42,231 INFO [00000018] :t707982 - memory 3159502.32k
所以,我的问题是第二种情况。我的正则表达式一直从第二个过程格式捕获注意:过程格式,而它应该忽略第一个,只捕获第二个:
(?s)(?<=proc[ ])(?P<type>\w+).*?(?:(?<=NOTE:[ ]PROCEDURE[ ])|(?<!=proc[ ]))(?P=type).*?(?=memory)
我用OR运算符尝试了|(?<!=proc[ ])
后面的否定查找,但仍然没有成功
you can see my regex in action here
你能帮我吗
对于该数据结构,要获取
proc {format} and note: procedure {format}
之间的数据,不必使用内联修饰符(?s)
让点匹配换行符以防止不必要的回溯如果希望数据介于两者之间,可以添加一个捕获组,而不是在开始时使用正向查找,匹配
proc format;
要获取中间的数据,您可以匹配不以任何proc格式开头的所有行;或包含
NOTE: PROCEDURE
介于两者之间的数据是捕获组2
解释
^
行的开始proc
逐字匹配(?P<type>\w+);
命名组{\r?\n\s*
匹配换行符和0+空格字符(
捕获第2组(?:
非捕获组(?!proc |.* NOTE: PROCEDURE )
断言直接在右边的不是proc
或该行包含NOTE: PROCEDURE
.*\r?\n
匹配除换行符0+次后跟换行符以外的任何字符)*
关闭组并重复0+次以匹配所有行.*(?= NOTE: PROCEDURE )
匹配任何字符,除了声明右边内容的换行符是NOTE: PROCEDURE
)
关闭组2Regex demo for the first dataRegex demo for the second data
相关问题 更多 >
编程相关推荐