Python正则表达式：将大型文本文件拆分为较小的部分问题的回答

Python正则表达式：将大型文本文件拆分为较小的部分

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在您的模式中，您只在左侧和右侧匹配一个<code>-</code>，并且<code>.*?</code>匹配0+个字符，而不是换行符非贪婪字符 这将为您提供大量的部分匹配，而不是匹配整行 <hr/> 您还可以使用匹配项，使用捕获组1作为文件名，使用捕获组2作为数据 <pre><code>^-+([^-]+)-+((?:\n(?! ).*)*) </code></pre> 解释 <ul> <li><code>^</code>字符串的开头</li> <li><code>-+</code>匹配1+次<code>-</code></li> <li><code>([^-]+)</code>捕获组1对于日期部分，匹配除<code>-</code>之外的所有字符</li> <li><code>-+</code>匹配1+次<code>-</code></li> <li><code>(</code>为数据部分捕获组2 <ul> <li><code>(?:\n(?! ).*)*</code>匹配所有不以<code> </code>开头的行</li> </ul> </li> <li><code>)</code>关闭组2</li> </ul> <a href="https://regex101.com/r/xhK9Pk/1" rel="nofollow noreferrer">Regex demo</a> 比如说 <pre><code>import re pattern = r"^-+([^-]+)-+((?:\n(?! ).*)*)" s = (" ~~~~~~~~~~~~~~~~~~~~~~~\n" "| |\n" "| First Block of text |\n" "| |\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n\n" " - Monday 8 August 2021 -\n\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n" "| |\n" "| Second Block of text |\n" "| |\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n\n" " - Friday 12 August 2021 -\n\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n" "| |\n" "| 3rd Block of text |\n" "| |\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n" " \n" " - Friday 19 August 2021 -\n\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n" "| |\n" "| 4th Block of text |\n" "| |\n" " ~~~~~~~~~~~~~~~~~~~~~~~\n") matches = re.findall(pattern, s, re.M) if matches: filename = matches[0][0].strip(); data = matches[0][1].strip(); print(filename) print(data) </code></pre> 输出 <pre><code>Monday 8 August 2021 ~~~~~~~~~~~~~~~~~~~~~~~ | | | Second Block of text | | | ~~~~~~~~~~~~~~~~~~~~~~~ </code></pre>

Python正则表达式：将大型文本文件拆分为较小的部分

1 个回答

相关Python问题