python正则表达式，子群上的重复

teststring = "This is just a string of literal text with some 0987654321 and an issue in it" reg = re.compile(r"([0-9]{3})*",re.DEBUG) outSearch = reg.search(teststring) print "Test with ([0-9]{3})*" if outSearch: print "groupSearch = " + outSearch.group() print reg = re.compile(r"([0-9]{3})+",re.DEBUG) outSearch = reg.search(teststring) print "Test with ([0-9]{3})+" if outSearch: print "groupSearch = " + outSearch.group()

max_repeat 0 4294967295 subpattern 1 max_repeat 3 3 in range (48, 57) Test with ([0-9]{3})* groupSearch = max_repeat 1 4294967295 subpattern 1 max_repeat 3 3 in range (48, 57) Test with ([0-9]{3})+ groupSearch = 098765432

2条回答

网友

1楼 · 编辑于 2024-05-02 03:33:17

您的第一个代码是使用*来重复-这意味着它将匹配上一个组的零次或多次出现。但是当您使用+来重复时，这需要至少出现一次。因此，只包含一个可选组的正则表达式将首先匹配字符串的最开始，如果组不接受字符串的第一个字符，则根本不匹配任何字符。如果您检查每个匹配项的start()和end()，就会更清楚：

teststring = "some 0987654321"
reg = re.compile(r"([0-9]{3})*",re.DEBUG)
outSearch = reg.search(teststring)

print("Test with ([0-9]{3})*")
if outSearch:
    print ("groupSearch = " + outSearch.group() + ' , ' + str(outSearch.start()) + ' , ' + str(outSearch.end()))

reg = re.compile(r"([0-9]{3})+",re.DEBUG)
outSearch = reg.search(teststring)

print("Test with ([0-9]{3})+")
if outSearch:
    print ("groupSearch = " + outSearch.group() + ' , ' + str(outSearch.start()) + ' , ' + str(outSearch.end()))

输出：

Test with ([0-9]{3})*
groupSearch =  , 0 , 0

Test with ([0-9]{3})+
groupSearch = 098765432 , 5 , 14

（第一个正则表达式的匹配从索引0开始，在索引0结束-空字符串）

这并不是Python独有的—这几乎是任何地方都需要的行为：

https://regex101.com/r/BwMWTq/1

（点击进入其他语言-查看所有语言（不仅仅是Python）如何在索引0处开始和结束匹配）

网友

2楼 · 编辑于 2024-05-02 03:33:17

在正则表达式中：

+：匹配前面标记的一个或多个
*：匹配前面标记的零个或多个

现在：

([0-9]{3})+将匹配一个或多个时间（+）3个连续数字（[0-9]{3}），因此它在主要匹配组（即组0 098765432）中包含9个数字，忽略0987654321中的最后1，即匹配范围从索引48到56（teststring[48:57]）。您也可以使用SRE_Match对象的span()方法来检查这一点，例如outSearch.span()
([0-9]{3})*将匹配零个或多个时间（*）3个连续数字；因为它也可以匹配零时间，所以它匹配字符串的开头并在那里停止，将空字符串作为主要的匹配组输出，即匹配范围从字符串索引0到0

相关问题更多 >

编程相关推荐

热门问题

热门文章