使用正则表达式从多行字符串中查找尽可能短的匹配

text1 (arbitrary chars and lines)\n <hr>\n Bitmap: ./media/logo.bmp\n text2 (arbitrary chars and lines)\n text3 (arbitrary chars and lines)\n <hr>\n Bitmap: ./media/logo.bmp\n text2 (arbitrary chars lines)\n \n

regex = re.compile('<hr>\n' 'Bitmap: [\S\n ]*' '$') print(re.search(regex, string).group()) >> '<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars and lines)\ntext3 (arbitrary chars and lines)\n<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars and lines)\n\n'

regex = re.compile('<hr>\n' 'Bitmap: [\S]*\n' '[\s\S]*?(?=<hr>|\n\Z)') print(re.findall(regex, string)) >> ['<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars and lines)\ntext3 (arbitrary chars and lines)\n', '<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars lines)\n']

2条回答

网友

1楼 · 编辑于 2024-10-08 18:26:57

使用

(?m)^<hr>\r?\nBitmap:[\s\S]*?(?=^<hr>$|\Z)

见proof

解释

                                        
  (?m)                     set flags for this block (with ^ and $
                           matching start and end of line) (case-
                           sensitive) (with . not matching \n)
                           (matching whitespace and # normally)
                                        
  ^                        the beginning of a "line"
                                        
  <hr>                     '<hr>'
                                        
  \r?                      '\r' (carriage return) (optional (matching
                           the most amount possible))
                                        
  \n                       '\n' (newline)
                                        
  Bitmap:                  'Bitmap:'
                                        
  [\s\S]*?                 any character of: whitespace (\n, \r, \t,
                           \f, and " "), non-whitespace (all but \n,
                           \r, \t, \f, and " ") (0 or more times
                           (matching the least amount possible))
                                        
  (?=                      look ahead to see if there is:
                                        
    ^                        the beginning of a "line"
                                        
    <hr>                     '<hr>'
                                        
    $                        before an optional \n, and the end of a
                             "line"
                                        
   |                        OR
                                        
    \Z                       the end of the string
                                        
  )                        end of look-ahead

网友

2楼 · 编辑于 2024-10-08 18:26:57

这是有效的：<hr>\nBitmap:.*\n(?:.*\n){1,2}

见：https://regex101.com/r/i64K0W/3

正则表达式中的问题是*，这是贪婪的

相关问题更多 >

编程相关推荐

热门问题

热门文章