如何编写一个正则表达式,使整个正则表达式成为一个包含两个可能的组的集合?

2024-05-17 05:43:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在文本中找到罗马数字后跟句号和空格的实例,比如IV.。这些标志着诗的开始。然而,有些诗句不是以罗马数字开头的,所以我在这些诗句的开头插入了[NV]标记。我有一个可以找到数字的正则表达式和一个可以找到[NV]标记的正则表达式,但是我不能将它们组合在一个正则表达式中来查找其中一个。你知道吗

我查找数字的正则表达式是:

numeralpat = re.compile(r'[IVX]{1,4}\. ')

我想我可以把它和另一个正则表达式放在一个集合中,找到一个数字或[NV]标记:

numeralpat = re.compile(r'[(\[NV\])([IVX]{1,4}\. )]')

这会导致同一类型的括号之间出现问题,因此我尝试转义不同的字符以使其正常工作。这些对我都不管用。这可以用regex实现吗?你知道吗

编辑以添加示例文本:

文本:

I. this is some text with a verse numeral
II. this is some text with a verse numeral
III. this is some text with a verse numeral
[NV]this is text with no verse numeral
IV. this is some text with a verse numeral
V. this is some text with a verse numeral

预期匹配:

'I. '
'II. '
'III. '
'[NV]'
'IV. '
'V. '

Tags: text标记文本is诗句with数字some
2条回答

你可以像这样交替组合两个正则表达式

(?:\[NV\]|[IVX]{1,4}\. )

这将匹配[NV]或任何IVX字符1到4次,后跟.和空格。你知道吗

Demo

您可以指定如下备用查找:r'(abc|def)'-查找'abc''def'-您还应该转义括号以查找显式的\[NV\],而不是'N''V'

import re

regex = r"(\[NV\]|[IVX]{1,4}\.)"

test_str = ("I. Some text\n"
    "some Text\n"
    "II. some text\n"
    "[NV] more text\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum= matchNum,
           start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum= groupNum,
               start = match.start(groupNum),
               end = match.end(groupNum), 
               group = match.group(groupNum)))

输出:

Match 1 was found at 0-2: I.
Group 1 found at 0-2: I.
Match 2 was found at 23-26: II.
Group 1 found at 23-26: II.
Match 3 was found at 37-41: [NV]
Group 1 found at 37-41: [NV]

https://regex101.com/r/MpMxcP/1

它查找'[NV]''[IVX]'中的任何一个,最多4次,后跟文字'.'

相关问题 更多 >