Python regex可选的捕获组或lastindex问题的回答

Python regex可选的捕获组或lastindex

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在使用python逐行搜索文件中的节和子节。在 <pre><code> *** Section with no sub section *** Section with sub section *** *** Sub Section *** *** Another section </code></pre> 部分以0-2个空格开头，后跟三个星号，子部分有2+空格，然后是星号。在 我写出没有“***”的章节/小节；目前（使用re.sub公司). 在 ^{pr2}$ 问题1：是否有一个带有捕获组的python regexp可以让我以捕获组的形式访问节/子节名称？在 问题2：regexp组如何允许我标识节或子节（可能基于匹配组)? 在 示例（非工作）： <pre><code>match=re.compile('(group0 *** )(group1 section title)(group2 ***)') sectionTitle = match.group(1) if match.lastindex = 0: sectionType = section with no subs if match.lastindex = 1: sectionType = section with subs if match.lastindex = 2: sectionTpe = sub section </code></pre> 以前的尝试 我已经能够用单独的regexp和if语句捕获部分或子部分，但我想一次完成所有操作。像下面这条线的东西；有第二组贪婪的麻烦。在 <pre><code>'(^\*{3}\s)(.*)(\s\*{3}$)' </code></pre> 我似乎无法让贪婪的人或可选择的小组一起工作。<a href="http://pythex.org/" rel="nofollow">http://pythex.org/</a>对这一点很有帮助。在 此外，我还尝试捕获星号“（*{3}）”，然后根据找到的组的数量来确定是部分还是子部分。在 <pre><code>sectionRegex=re.compile('(\*{3})' m=re.search(sectionRegex) if m.lastindex == 0: sectionName = re.sub(sectionRegex,'',line) #Set a section flag if m.lastindex ==1: sectionName = re.sub(sectionRegex,''line) #Set a sub section flag. </code></pre> 谢谢 也许我完全错了。感谢任何帮助。在 最新更新 我一直在玩Pythex，answers和其他研究。我现在花更多的时间来捕捉这些词： <pre><code>^[a-zA-Z]+$ </code></pre> 并计算星号匹配的数量来确定“级别”。我仍然在搜索一个regexp来匹配两到三个“组”。可能不存在。在 谢谢。在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<blockquote> QUESTION 1: Is there a python regexp with capture groups that would let me access the section/sub section names as a capture group? <blockquote> a single regexp to match the two - three "groups". May not exist </blockquote> </blockquote> 是的，这是可以做到的。我们可以将条件分解为以下树： <ul> <li><kbd>行首</kbd>+<kbd>0到2个空格</kbd></li> <li>两种交替： <ol> <li><code>***</code>+<kbd>任何文本</kbd>[组1]</li> <li><kbd>1+空格</kbd>+<code>***</code>+<kbd>任何文本</kbd>[group 2]</li> </ol></li> <li><code>***</code>（可选）+<kbd>行尾</kbd></li> </ul> 上面的树可以用以下模式表示： <pre class="lang-none prettyprint-override"><code>^[ ]{0,2}(?:[*]{3}(.*?)|[ ]+[*]{3}(.*?))(?:[*]{3})?$ </code></pre> <ul> <li><a href="https://regex101.com/r/mV0gN4/1" rel="nofollow">regex101 DEMO</a></li> </ul> 注意节和子节被不同的组捕获（[组1]和[组2]）。它们都使用相同的语法<code>.*?</code>，都带有一个<a href="http://www.regular-expressions.info/repeat.html#lazy" rel="nofollow">lazy quantifier (the extra "?")</a>，以允许结尾的可选<code>"***"</code>匹配。在 <hr/> <blockquote> QUESTION 2: How would the regexp groups allow me to ID section or sub section (possibly based on the number of /content in a match.group)? </blockquote> 上述regex只在组1中捕获部分，而子节仅在组2中捕获。为了在代码中更容易识别，我将使用<a href="http://www.regular-expressions.info/named.html" rel="nofollow">^{<cd6>}</a>并使用<a href="https://docs.python.org/2/library/re.html#re.MatchObject.groupdict" rel="nofollow">^{<cd7>}</a>检索捕获。在 <h3>代码：</h3> ^{pr2}$ <ul> <li><a href="http://ideone.com/9fRpY6" rel="nofollow">ideone DEMO</a></li> </ul> 为了引用每个节/小节，您可以使用以下方法之一，而不是打印dict： <pre><code>match.group("Section") match.group(1) match.group("SubSection") match.group(2) </code></pre>

Python regex可选的捕获组或lastindex

1 个回答

相关Python问题