Python regex可选的捕获组或lastindex

match=re.compile('(group0 *** )(group1 section title)(group2 ***)') sectionTitle = match.group(1) if match.lastindex = 0: sectionType = section with no subs if match.lastindex = 1: sectionType = section with subs if match.lastindex = 2: sectionTpe = sub section

sectionRegex=re.compile('(\*{3})' m=re.search(sectionRegex) if m.lastindex == 0: sectionName = re.sub(sectionRegex,'',line) #Set a section flag if m.lastindex ==1: sectionName = re.sub(sectionRegex,''line) #Set a sub section flag.

3条回答

网友

1楼 · 编辑于 2024-10-16 17:20:05

QUESTION 1: Is there a python regexp with capture groups that would let me access the section/sub section names as a capture group?
a single regexp to match the two - three "groups". May not exist

是的，这是可以做到的。我们可以将条件分解为以下树：

行首+0到2个空格
两种交替：
1. ***+任何文本^[组1]
2. 1+空格+***+任何文本^{[group 2]}
***^（可选）+行尾

上面的树可以用以下模式表示：

^[ ]{0,2}(?:[*]{3}(.*?)|[ ]+[*]{3}(.*?))(?:[*]{3})?$

regex101 DEMO

注意节和子节被不同的组捕获（^[组1]和^[组2]）。它们都使用相同的语法.*?，都带有一个lazy quantifier (the extra "?")，以允许结尾的可选"***"匹配。在

QUESTION 2: How would the regexp groups allow me to ID section or sub section (possibly based on the number of /content in a match.group)?

上述regex只在组1中捕获部分，而子节仅在组2中捕获。为了在代码中更容易识别，我将使用^{}并使用^{}检索捕获。在

代码：

^{pr2}$

ideone DEMO

为了引用每个节/小节，您可以使用以下方法之一，而不是打印dict：

match.group("Section")
match.group(1)
match.group("SubSection")
match.group(2)

网友

2楼 · 编辑于 2024-10-16 17:20:05

正则表达式：

(^\s+)(\*{3})([a-zA-Z\s]+)(\*{3})*

捕获3或4个组，如下所述。在

^{pr2}$

网友

3楼 · 编辑于 2024-10-16 17:20:05

假设您的意思是子部分有3个以上的空格，您可以这样做：

import re

data = '''
  *** Section with no sub section
*** Section with sub section ***
           *** Sub Section ***
 *** Another section
'''

pattern = r'(?:(^ {0,2}\*{3}.*\*{3} *$)|(^ {0,2}\*{3}.*)|(^ *\*{3}.*\*{3} *$))'

regex = re.compile(pattern, re.M)
print regex.findall(data)

这将为您提供如下分组：

^{pr2}$

代码：

相关问题更多 >

编程相关推荐

热门问题

热门文章