正则表达式选择性地包含sep

2024-10-03 15:32:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在两个正则表达式模式之间找到字符串。棘手的部分是,“before模式”的一部分需要包含在输出字符串中

下面是我的代码的简化版本

import re
start_pattern = "( StartString1 | StartString2 | StartString3ShouldBeIncluded | StartString4ShouldBeIncluded )"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'{start_pattern}(?P<content>.*?){end_pattern}'

input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
output = re.search(joined_pattern, input1).group('content')
print(output)  # Prints 'THECONTENT' which is what I want

input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
output = re.search(joined_pattern, input2).group('content')
print(output)  # Prints 'THECONTENT' but I want 'StartString3ShouldBeIncluded THECONTENT'

有没有办法改变这个正则表达式来得到我想要的输出


Tags: 字符串reoutput模式contentstartendpattern
2条回答

只需移动组名的位置,如下所示:

import re

start_pattern = "( StartString1 | StartString2 | StartString3ShouldBeIncluded | StartString4ShouldBeIncluded )"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'(?P<content>{start_pattern}.*?){end_pattern}'

input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
output = re.search(joined_pattern, input1).group('content')
print(output)  # Prints 'THECONTENT' which is what I want

input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
output = re.search(joined_pattern, input2).group('content')
print(output)  # Prints 'StartString3ShouldBeIncluded THECONTENT'                    

打印内容:

 StartString1 THECONTENT
 StartString3ShouldBeIncluded THECONTENT

您可以使应该包含的开始字符串成为它们自己的命名组,并在匹配后将两个命名组连接起来。由于应该包含的起始字符串可能不匹配并成为None,因此在加入content组之前,可以使用or运算符将值默认为空字符串:

import re
start_pattern = "( StartString1 | StartString2 |(?P<start> StartString3ShouldBeIncluded | StartString4ShouldBeIncluded ))"
end_pattern = "( EndString1 | EndString2 )"
joined_pattern = f'{start_pattern}(?P<content>.*?){end_pattern}'

input1 = "...somejunk ... StartString1 THECONTENT EndString1 ...somejunk ... "
match = re.search(joined_pattern, input1)
output = (match.group('start') or '') + match.group('content')
print(output)  # Prints 'THECONTENT' which is what I want

input2 = "...somejunk ... StartString3ShouldBeIncluded THECONTENT EndString2 ...somejunk ..."
match = re.search(joined_pattern, input2)
output = (match.group('start') or '') + match.group('content')
print(output)  # Prints 'StartString3ShouldBeIncluded THECONTENT'

相关问题 更多 >