python重新字符串替换具有较少匹配的函数

2024-10-01 17:40:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Python的re模块替换子字符串,例如:

>>> import re
>>> re.sub(r"a.*b","ab","acbacbacb")
'ab'

这将.*cbacbac匹配,但我希望它匹配c三次,这样输出是ababab。在

谁能告诉我怎么做吗?在


Tags: 模块字符串importreababababcbacbacacbacbacb
3条回答

默认情况下,Regex是贪婪的。使用.*?

>>> import re
>>> re.sub(r"a.*?b","ab","acbacbacb")
'ababab'
>>> 

http://docs.python.org/library/re.html

The *, +, and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

使用非贪婪匹配:

re.sub(r"a.*?b","ab","acbacbacb")
'ababab'

来自http://docs.python.org/library/re.html

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired. [...] Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.

最简单的解决方案是使用lazy(非贪心)*?运算符:

>>> re.sub(r"a.*?b","ab","acbacbacb")
'ababab'

然而,这可能会对性能产生影响。由于此正则表达式的结构,您也可以使用等效的

^{pr2}$

它的性能会更好,这取决于优化器的性能。在

{2}如果你有更多的先验知识,你就应该把它变得更明确。例如,假设您已经知道在ab之间只有c,您可以这样做

re.sub(r"ac*b","ab","acbacbacb")

相关问题 更多 >

    热门问题