如何使用regex捕获重复字符集?

2024-09-30 02:31:31 发布

您现在位置:Python中文网/ 问答频道 /正文

import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
print ("not coming here")
matches = re.findall(regex,line)
print (matches)

在上面的代码中,我试图捕获重复字符的组。你知道吗

例如,我需要这样的答案: 111 222 等等

但是当我运行上面的代码时,我得到一个错误:

Traceback (most recent call last):
  File "First.py", line 3, in <module>
    regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin
e 224, in compile
    return _compile(pattern, flags)
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py", lin
e 293, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_compile
.py", line 536, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 829, in parse
    p = _parse_sub(source, pattern, 0)
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 778, in _parse
    p = _parse_sub(source, state)
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 524, in _parse
    code = _escape(source, this, state)
  File "C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre_parse.p
y", line 415, in _escape
    len(escape))
sre_constants.error: cannot refer to an open group at position 16

请有人指导我哪里出了问题。你知道吗


Tags: inresourceparseliblocallineusers
3条回答

.findall做这个是可能的,但是用.finditer做这个更简单,如Jan的回答所示。你知道吗

import re

line = "..12345678910111213141516171820212223"
regex = re.compile(r'(([a-zA-Z0-9])\2+)')

matches = [t[0] for t in regex.findall(line)]
print(matches)

输出

['111', '222']

我们使用\2,因为\1表示外圆括号中的模式,\2表示内圆括号中的模式。你知道吗

在另一个组中找不到组引用。如果您只想打印出那些重复的字符,那么有一个小技巧可以使用re.sub

def foo(m):
     print(m.group(0))
     return ''

_ = re.sub(r'(\w)\1+', foo, line) # use [a-zA-Z0-9] if you don't want to match underscores
111
222

你(可能)想要

([a-zA-Z0-9])\1+

a demo on regex101.com


Python中:
import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'([a-zA-Z0-9])\1+')

matches = [match.group(0) for match in regex.finditer(line)]
print (matches)
# ['111', '222']

相关问题 更多 >

    热门问题