发生后匹配多行的正则表达式

2024-09-24 04:19:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个正则表达式:

^lineC:\n\t(.*)

我有一个多行字符串

lineA:
    line1
    line2
    line3

lineB:
    line4
    line5
    line6

lineC:
    line7
    line8
    line9

仅与第7行匹配。但是我需要返回第7行、第8行和第9行。 我可以这样做

^lineC:\n\t(.*)\n\t(.*)\n\t(.*)

但是当然,如果我在lineC下有更多的行,它将不起作用。有什么想法吗? link with example on regex101.com


Tags: 字符串withlinkline1line2line3line6line5
3条回答

您可以选择在捕获组1内的第一次匹配后重复换行和制表符,以使所有行都位于组1中

注意.*也可以匹配\n\t后面的空部分

^lineC:\n\t(.*(?:\n\t.*)*)
  • ^字符串的开头
  • lineC:\n\t匹配{}换行符和制表符
  • (捕获第1组
    • .*匹配任意字符0+次
    • (?:\n\t.*)*可选地重复换行符、制表符和行的其余部分
  • )关闭组1

Regex demo

示例代码

import re
 
regex = r"^lineC:\n\t(.*(?:\n\t.*)*)"
 
s = ("lineA:\n"
    "   line1\n"
    "   line2\n"
    "   line3\n\n"
    "lineB:\n"
    "   line4\n"
    "   line5\n"
    "   line6\n\n"
    "lineC:\n"
    "   line7\n"
    "   line8\n"
    "   line9")
 
print(re.findall(regex, s, re.MULTILINE))

输出

['line7\n\tline8\n\tline9']

这可能会奏效:

test_str1 = """
lineA:
    line1
    line2
    line3

lineB:
    line4
    line5
    line6

lineC:
    line7
    line8
    line9
"""

test_str2 = """
lineA:
    line1
    line2
    line3

lineB:
    line4
    line5
    line6

lineC:
    line7
    line8
    line9
    line10
    line11

lineD:
    line12
    line13
    line14
"""


p = re.compile(r'(?m)^lineC:((?:\n\t(?:.*))*)')
m = re.findall(p, test_str1)
m
['\n\tline7\n\tline8\n\tline9']

m = re.findall(p, test_str2)
m
['\n\tline7\n\tline8\n\tline9\n\tline10\n\tline11']

这应该可以做到:

import re
from pprint import pprint

reg = re.compile(r"(\w+):\n((?:\s+\w+(?:\n|$))*)")

with open('file.txt', 'r') as f:
    data = {
        name: lines.split()
        for name, lines in reg.findall(f.read())
    }

pprint(data)

产出:

{'lineA': ['line1', 'line2', 'line3'],
 'lineB': ['line4', 'line5', 'line6'],
 'lineC': ['line7', 'line8', 'line9']}

{}

捕获两个主要组:(\w+)((?:\s+\w+(?:\n|$))*)

所有其他组都设置为非捕获以使findall易于使用

(\w+)                     Capture a word in group 1
:\n                       Group 1 followed by :\n
(                         Start the capture for group 2
    (?:                   Start a non capturing group for the repeated content
        \s+\w+            Starts with whitespace followed by a word
        (?:\n|$)          Followed by a new line or the file end
    )*                    Non capturing group repeats multiple times
)                         End group 2

相关问题 更多 >