<p>在您的模式中,您只在左侧和右侧匹配一个<code>-</code>,并且<code>.*?</code>匹配0+个字符,而不是换行符非贪婪字符</p>
<p>这将为您提供大量的部分匹配,而不是匹配整行</p>
<hr/>
<p>您还可以使用匹配项,使用捕获组1作为文件名,使用捕获组2作为数据</p>
<pre><code>^-+([^-]+)-+((?:\n(?! ).*)*)
</code></pre>
<p><strong>解释</strong></p>
<ul>
<li><code>^</code>字符串的开头</li>
<li><code>-+</code>匹配1+次<code>-</code></li>
<li><code>([^-]+)</code>捕获<strong>组1</strong>对于日期部分,匹配除<code>-</code>之外的所有字符</li>
<li><code>-+</code>匹配1+次<code>-</code></li>
<li><code>(</code>为数据部分捕获<strong>组2</strong>
<ul>
<li><code>(?:\n(?! ).*)*</code>匹配所有不以<code> </code>开头的行</li>
</ul>
</li>
<li><code>)</code>关闭组2</li>
</ul>
<p><a href="https://regex101.com/r/xhK9Pk/1" rel="nofollow noreferrer">Regex demo</a></p>
<p>比如说</p>
<pre><code>import re
pattern = r"^-+([^-]+)-+((?:\n(?! ).*)*)"
s = (" ~~~~~~~~~~~~~~~~~~~~~~~\n"
"| |\n"
"| First Block of text |\n"
"| |\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n\n"
" - Monday 8 August 2021 -\n\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n"
"| |\n"
"| Second Block of text |\n"
"| |\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n\n"
" - Friday 12 August 2021 -\n\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n"
"| |\n"
"| 3rd Block of text |\n"
"| |\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n"
" \n"
" - Friday 19 August 2021 -\n\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n"
"| |\n"
"| 4th Block of text |\n"
"| |\n"
" ~~~~~~~~~~~~~~~~~~~~~~~\n")
matches = re.findall(pattern, s, re.M)
if matches:
filename = matches[0][0].strip();
data = matches[0][1].strip();
print(filename)
print(data)
</code></pre>
<p>输出</p>
<pre><code>Monday 8 August 2021
~~~~~~~~~~~~~~~~~~~~~~~
| |
| Second Block of text |
| |
~~~~~~~~~~~~~~~~~~~~~~~
</code></pre>