<p>正则表达式</p>
<pre><code>"^([A-Z\s]+)(?:\[(?:[\w ]+)\])?:(.*?)$"
</code></pre>
<ul>
<li><code>A-Z</code>可以更改为<code>\w</code></li>
<li>要获取上下文,应将<code>(?:[\w ]+)</code>更改为<code>([\w ]+)</code></li>
</ul>
<p><strong>代码</strong></p>
<pre><code>import re
regex = r"^([A-Z\s]+)(?:\[(?:[\w ]+)\])?:(.*?)$"
test_str = ("PAUL: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. \n\n"
"LEONARD: Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. \n\n"
"EVIL NINJA [on the roof]: In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim. \n\n"
"PAUL [SCREAMING]: Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus. ")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
</code></pre>
<p><strong>输出</strong></p>
<pre><code>Match 1 was found at 0-100: PAUL: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
Group 1 found at 0-4: PAUL
Group 2 found at 5-97: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
Match 2 was found at 100-381: LEONARD: Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
Group 1 found at 100-107: LEONARD
Group 2 found at 108-378: Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu.
Match 3 was found at 381-684: EVIL NINJA [on the roof]: In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim.
Group 1 found at 381-392: EVIL NINJA
Group 2 found at 406-681: In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium. Integer tincidunt. Cras dapibus. Vivamus elementum semper nisi. Aenean vulputate eleifend tellus. Aenean leo ligula, porttitor eu, consequat vitae, eleifend ac, enim.
Match 4 was found at 684-767: PAUL [SCREAMING]: Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus.
Group 1 found at 684-689: PAUL
Group 2 found at 701-767: Aliquam lorem ante, dapibus in, viverra quis, feugiat a, tellus.
</code></pre>