如何使用regex捕获重复字符集？

Traceback (most recent call last): File "First.py", line 3, in <module> regex = re.compile(r'((?:[a-zA-Z0-9])\1+)') File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin e 224, in compile return _compile(pattern, flags) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin e 293, in _compile p = sre_compile.compile(pattern, flags) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_compile .py", line 536, in compile p = sre_parse.parse(p, flags) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p y", line 829, in parse p = _parse_sub(source, pattern, 0) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p y", line 437, in _parse_sub itemsappend(_parse(source, state)) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p y", line 778, in _parse p = _parse_sub(source, state) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p y", line 437, in _parse_sub itemsappend(_parse(source, state)) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p y", line 524, in _parse code = _escape(source, this, state) File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p y", line 415, in _escape len(escape)) sre_constants.error: cannot refer to an open group at position 16

3条回答

网友

1楼 · 编辑于 2024-09-30 02:31:31

用.findall做这个是可能的，但是用.finditer做这个更简单，如Jan的回答所示。你知道吗

import re

line = "..12345678910111213141516171820212223"
regex = re.compile(r'(([a-zA-Z0-9])\2+)')

matches = [t[0] for t in regex.findall(line)]
print(matches)

输出

['111', '222']

我们使用\2，因为\1表示外圆括号中的模式，\2表示内圆括号中的模式。你知道吗

网友

2楼 · 编辑于 2024-09-30 02:31:31

在另一个组中找不到组引用。如果您只想打印出那些重复的字符，那么有一个小技巧可以使用re.sub：

def foo(m):
     print(m.group(0))
     return ''

_ = re.sub(r'(\w)\1+', foo, line) # use [a-zA-Z0-9] if you don't want to match underscores
111
222

网友

3楼 · 编辑于 2024-09-30 02:31:31

你（可能）想要

([a-zA-Z0-9])\1+

见a demo on regex101.com。

在Python中：

import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'([a-zA-Z0-9])\1+')

matches = [match.group(0) for match in regex.finditer(line)]
print (matches)
# ['111', '222']

相关问题更多 >

编程相关推荐

热门问题

热门文章