Python：在捕获regex中省略内容

# text # desired capture The certolizumab pegol (Cmzia, CZP)... 'CZP' The drug 6-mercatopureine (6-mp) ... '6-mp' The merits of 5-Asasdfdsf (5-ASA) ... '5-ASA'

# in p1, I add the pattern to the list, separated by '|' >>> p1 = re.compile(r'$(\S*[A-Z-0-9]\S*|\S*\s+[A-Z-0-9]+)$') >>> p1.findall('The certolizumab pegol (Cmzia, CZP)') ['Cmzia, CZP'] # in p2, I use a broad non-capturing group, enclosing the desired captured expressions in parentheses >>> p2 = re.compile(r'$(?:(\S*[A-Z-0-9]\S*)|\S*\s+([A-Z-0-9]+))$') >>> p2.findall('The certolizumab pegol (Cmzia, CZP)') [('', '', 'CZP')] # this is an addition to the original post # demonstrates that the non-capturing expression doesn't prevent capture of the section \S*\s+ >>> p3 = re.compile(r'$(\S*[A-Z-0-9]\S*|(?:\S*\s+)[A-Z-0-9]+)$') >>> p3.findall('The certolizumab pegol (Cmzia, CZP)') ['Cmzia, CZP']

3条回答

网友

1楼 · 编辑于 2024-10-01 04:49:58

我不太明白您想要什么，但我在对应于'CZP'的部分加了另一个匹配的括号，并使外部组不匹配，得到如下结果：

>>> p3 = re.compile(r'\((?:\S*[A-Z-0-9]\S*|[A-Z-0-9]* [A-Z-0-9]*|(?:\S*\s+)([A-Z-0-9]+))\)')
>>> p3.findall('The certolizumab pegol (Cmzia, CZP)')
['CZP']

网友

2楼 · 编辑于 2024-10-01 04:49:58

如果我没看错，括号内可能有一到两个逗号分隔的值。如果是两个，你只想抓住第二个。试试这个：

p = re.compile(r'\((?:[^,)]+,\s*)?([A-Za-z0-9-]+)\)')

在开始paren之后，(?:[^,)]+,\s*)?尝试匹配第一个值，它通过后面的逗号来标识第一个值。您并不真正关心第一个值是什么样子，只要其中没有任何逗号。但不能只使用[^,]+，因为在只有一个值的情况下，这会匹配太多。将paren添加到排除的字符列表中，使匹配保持在一组括号中。在

网友

3楼 · 编辑于 2024-10-01 04:49:58

这是我找到的实现目标的最简单的正则表达式：

>>> p = "\((?:\S*,\s+)?(\S*)\)"
>>> s = "The cert pegol (Cmzia, CZP) some words (6-mp) and (5-ASA)"
>>> re.findall(p,s)
['CZP', '6-mp', '5-ASA']

更新

下一个更具限制性，但结果相同：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章