python regex mediawiki节解析问题的回答

python regex mediawiki节解析

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

首先，应该知道，我对Python有一点了解，但我从来没有用它正式编程。。。Codepad说这行得通，所以开始吧！：D抱歉，表达式太复杂： <pre><code>(?<!=)==([^=]+)==(?!=)([\s\S]*?(?=$|(?<!=)==[^=]+==(?!=))) </code></pre> 我相信这符合你的要求！在代码板上，<a href="http://codepad.org/TAXfTRFr" rel="nofollow">this code</a>： ^{pr2}$ 产生以下结果： <pre><code>[('Mainsection1', '\nSome text here\n===Subsection1.1===\nOther text here\n\n'), ('Mainsection2', '\nText goes here\n===Subsecttion2.1===\nOther text goes here. ')] </code></pre> 编辑：分解后，表达式基本上是这样说的： <pre><code>01 (?<!=) # First, look behind to assert that there is not an equals sign 02 == # Match two equals signs 03 ([^=]+) # Capture one or more characters that are not an equals sign 04 == # Match two equals signs 05 (?!=) # Then verify that there are no equals signs following this 06 ( # Start a capturing group 07 [\s\S]*? # Match zero or more of ANY character (even CrLf), but BE LAZY 08 (?= # Look ahead to verify that either... 09 $ # this is the end of the 10 | # -OR- 11 (?<!=) # when I look behind there is no equals sign 12 == # then there are two equals signs 13 [^=]+ # then one or more characters that are not equals signs 14 == # then two equals signs 15 (?!=) # then verify that there are no equals signs following this 16 ) # End look-ahead group 17 ) # End capturing group </code></pre> 第<code>03</code>行和第<code>06</code>行分别指定主节标题和主节内容的捕获组。在 如果您对Regex不太流利，第<code>07</code>行需要大量的解释。。。在 <ul> <li>字符类中的<code>\s</code>和{<cd5>}将匹配任何空白或非空白的内容（即任何内容）——一种替代方法是使用<code>.</code>运算符，但这取决于编译器的选项（或指定选项的能力），这可能与CrLf（或回车符/换行符）匹配，也可能不匹配。由于要匹配多行，这是确保匹配的最简单方法。在</li> <li>结尾的<code>*?</code>意味着它将匹配“anything”字符类的零个或多个实例，但要对其懒惰-“LAZY”量词（有时称为“discious”）与默认的“greedy”量词相反（后面没有<code>?</code>），并且不会使用源字符，除非紧随其后的源无法与紧跟在lazy量词后面的表达式部分匹配。换句话说，它将使用任何字符，直到它找到源文本的结尾或另一个主要部分，而另一个主要部分是由两个且只有两个等号指定的，在一个或多个不是等号（包括空白）的字符的任一侧。如果没有lazy运算符，它将尝试使用整个源文本，然后“回溯”直到它可以匹配表达式中它后面的某个内容（源代码结尾或节头）</li> </ul> 第<code>08</code>行是一个“前瞻”，它指定后面的表达式应该能够匹配，但不应该被使用。在 结束编辑 阿飞，它必须是如此复杂，以便正确地排除这些小节。。。如果要将节名称和节内容匹配到命名组中，可以尝试以下操作： <pre><code>(?<!=)==(?P<SectionName>[^=]+)==(?!=)(?P<SectionContent>[\s\S]*?(?=$|(?<!=)==[^=]+==(?!=))) </code></pre> 如果你愿意的话，我可以帮你分解一下！就问吧！编辑（参见上面的编辑）结束编辑

python regex mediawiki节解析

1 个回答

相关Python问题