Python正则表达式来匹配方括号内的数字列表

2024-10-04 03:24:01 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我试着做一个函数,返回文本中的所有引用(cit),有时候这个文本是一个列表,这就是为什么我首先验证它。在

def get_cits_from_note(note):
    if note:
        if isinstance(note, list):
            note = "".join(note)
        matchGroups = re.findall(r'\|CITS\s*:*\s*\[\s*(\d+)', note)
        if matchGroups:
            citsList = [match for match in matchGroups]
            print citsList

文本将是这样的(文本是我从维基百科复制/粘贴的,这就是为什么它没有任何意义):

A bracket is a tall punctuation mark typically used in matched pairs within text, |CITS: [123],[456],[789]| to set apart or interject other text. The matched pair is best described as opening and |CITS: [999]|. Less formally, in a left-to-right context, it may be described as left and right, and in a right-to-left context, as right and left.

这是我构建的第一个正则表达式:

^{pr2}$

但它只能打印:

[u'123']

所以我做了第二个正则表达式:

matchGroups = re.findall(r'\|CITS\s*:*\s*((\[\s*(\d+)]+,*\s*)+)\|', note)

但它不像我想要的那样工作,因为它打印了:

[(u'[123], [456], [789]', u'[789]', u'789'), (u'[999]', u'[999]', u'999')]

我已经处理这个正则表达式有一段时间了,我无法使它工作,谁能告诉我我缺少什么?在

最终输出应为:

[u'123',u'456',u'789',u'999']

Tags: andtoin文本rightreifas
2条回答
import re
note = "A bracket is a tall punctuation mark typically used in matched pairs within text, |CITS: [123],[456],[789]| to set apart or interject other text. The matched pair is best described as opening and |CITS: [999]|. Less formally, in a left-to-right context, it may be described as left and right, and in a right-to-left context, as right and left."
matchGroups = re.findall(r'\d+', note)
print matchGroups

输出

^{pr2}$

不仅仅是正则表达式,但如果我正确理解您的目标,这可以做到:

raw_list = [x.strip().split(',')
            for x in re.findall(r'\|CITS\s*:([\[\]\d\s,]+)', note)]
flatten = lambda l : [item for sublist in l for item in sublist]
cits = flatten(raw_list)

然而,这也会匹配像 “| CITS:[[1,7[,,,”。在

相关问题 更多 >