我有一个多行字符串,具有给定的表示形式:
text1 (arbitrary chars and lines)\n
<hr>\n
Bitmap: ./media/logo.bmp\n
text2 (arbitrary chars and lines)\n
text3 (arbitrary chars and lines)\n
<hr>\n
Bitmap: ./media/logo.bmp\n
text2 (arbitrary chars lines)\n
\n
我想匹配字符串中总是出现两次的子字符串(最后总是出现一次):
<hr>\n
Bitmap: ./media/logo.bmp\n
text2 (arbitrary chars and lines)\n
当我尝试与re.search
匹配时,它返回长匹配:
regex = re.compile('<hr>\n'
'Bitmap: [\S\n ]*'
'$')
print(re.search(regex, string).group())
>> '<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars and lines)\ntext3 (arbitrary chars and lines)\n<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars and lines)\n\n'
是否可以使用regex
查找短匹配项
解决方案:
带OR运算符的前瞻返回两个匹配项(一个长一个短):
regex = re.compile('<hr>\n'
'Bitmap: [\S]*\n'
'[\s\S]*?(?=<hr>|\n\Z)')
print(re.findall(regex, string))
>> ['<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars and lines)\ntext3 (arbitrary chars and lines)\n', '<hr>\nBitmap: ./media/logo.bmp\ntext2 (arbitrary chars lines)\n']
使用
见proof
解释
这是有效的:
<hr>\nBitmap:.*\n(?:.*\n){1,2}
见:https://regex101.com/r/i64K0W/3
正则表达式中的问题是
*
,这是贪婪的相关问题 更多 >
编程相关推荐