如何为重复分隔符中的各种字符串构造正则表达式匹配？

2条回答

网友
1楼 · 编辑于 2024-05-17 19:43:16

一种选择是使用Python PyPi regex module和\G锚
第1组包括书名和章节号，第2组、第3组和第4组为下面的章节号
循环结果，您可以检查组的存在
\b(?:([A-Z]{2,})(?= \d+:\d)|\G(?!^))(?:(\d+):(\d+))?\s*((?:[^\dA-Z]+|\d++(?!:\d)|[A-Z](?![A-Z]+ \d+:\d))*)
解释
\b单词边界
(?:非捕获组
([A-Z]{2,})(?= \d+:\d)捕获组1，匹配2个或多个大写字符，并断言直接位于右侧的是空格、1+个数字:和一个数字
|或
\G(?!^)在上一个匹配的末尾而不是开始处断言位置
)闭合群
(?:非捕获组
(\d+):(\d+)在组2和组3中捕获1个或多个数字
)?\s*关闭组，使其成为可选的，并匹配可选的空白字符
(捕获第4组
(?:非捕获组
[^\dA-Z]+匹配除数字或a-Z以外的任何字符的1+倍
|或
\d++(?!:\d)以所有格的方式匹配1+个数字，并断言右边的不是:后跟一个数字
|或
[A-Z](?![A-Z]+ \d+:\d)匹配字符a-Z并断言直接位于右侧的不是1+字符a-Z、空格、1+数字:和数字
)*关闭组并重复0+次
)关闭组4
Regex demo Python demo
比如说
import regex pattern = r"\b(?:([A-Z]{2,})(?= \d+:\d)|\G(?!^))(?:(\d+):(\d+))?\s*((?:[^\dA-Z]+|\d++(?!:\d)|[A-Z](?![A-Z]+ \d+:\d))*)" s = ("GENESIS 1:1 In the beginning God created the heavens ... the ground. 2:7 And the LORD ... I buried Leah. 49:32 The purchase of the field and of the cave ... and he was put in a coffin in Egypt. EXODUS 1:1 Now these are the names ...\n") matches = regex.finditer(pattern, s) for matchNum, match in enumerate(matches, start=1): if (match.group(1)): print(f"Book name: {match.group(1)}") print(" ") else: print(f"Chapter Nr: {match.group(2)}\nVerse Nr: {match.group(3)}\nThe verse: {match.group(4)}\n")
输出
Book name: GENESIS Chapter Nr: 1 Verse Nr: 1 The verse: In the beginning God created the heavens ... the ground. Chapter Nr: 2 Verse Nr: 7 The verse: And the LORD ... I buried Leah. Chapter Nr: 49 Verse Nr: 32 The verse: The purchase of the field and of the cave ... and he was put in a coffin in Egypt. Book name: EXODUS Chapter Nr: 1 Verse Nr: 1 The verse: Now these are the names ...

网友
2楼 · 编辑于 2024-05-17 19:43:16

我用纯python提出了一个re解决方案。多亏了上面的回答，我才能够走上正轨。事实证明，我试图通过测试LORD 2:8 ...来使用的扳手实际上并不是问题，因为在整个字符串中[A-Z]和\d之间没有标点符号的情况下，非标题大写字母从来不会以这种方式出现在数字之前
使用与派生模式相同的示例：
import re pattern = r"(?:([A-Z]{2,})(?= \d+:\d)|(?!^))(?:(\d+):(\d+))?\s*((?:[^\dA-Z]+(?!:\d)|[A-Z](?![A-Z]+ \d+:\d))+)" s = ("GENESIS 1:1 In the beginning God created the heavens ... the ground. 2:7 And the LORD ... I buried Leah. 49:32 The purchase of the field and of the cave ... and he was put in a coffin in Egypt. EXODUS 1:1 Now these are the names ...\n") match = re.finditer(pattern, s) for matchNum, match in enumerate(matches, start=1): if (match.group(1)): print(f"Book name: {match.group(1)}") print(" ") else: print(f"Chapter Nr: {match.group(2)}\nVerse Nr: {match.group(3)}\nThe verse: {match.group(4)}\n")
与regex一样，输出是：
Book name: GENESIS Chapter Nr: 1 Verse Nr: 1 The verse: In the beginning God created the heavens ... the ground. Chapter Nr: 2 Verse Nr: 7 The verse: And the LORD ... I buried Leah. Chapter Nr: 49 Verse Nr: 32 The verse: The purchase of the field and of the cave ... and he was put in a coffin in Egypt. Book name: EXODUS Chapter Nr: 1 Verse Nr: 1 The verse: Now these are the names ...

相关问题更多 >

编程相关推荐

热门问题

热门文章