使用nongreedy regex捕获部分文本

2024-10-02 00:19:43 发布

您现在位置:Python中文网/ 问答频道 /正文

使用关于芬德尔,我要提取分配给每个PCR的值。你知道吗

>>> z
'PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\n

>>> print z
PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

起初,我试过这个,但有人能指出什么是错误的正则表达式使用?你知道吗

>>> re.search('PCR-09:(.*?)', z).groups()
('',)

在找到换行符之前,非贪婪表达式不应该匹配所有字符吗?你知道吗

通过稍微修改regex,我得到了期望的结果:

>>> re.search('PCR-09:(.*?)\s\r\n', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00',)

同样的道理,这是行不通的:

>>> re.findall(r'(PCR-\d+):(.*?)', z)
[('PCR-09', ''), ('PCR-10', ''), ('PCR-11', ''), ('PCR-12', ''), ('PCR-13', ''), ('PCR-14', ''), ('PCR-15', ''), ('PCR-16', ''), 

但事实上:

>>> re.findall(r'(PCR-\d+):(.*?)\s\r\n', z,re.DOTALL)
[('PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'),

希望有人能解释我的方法有什么问题。你知道吗

谢谢


Tags: 方法research表达式错误字符regexgroups
3条回答

r'PCR-09:(.*?)'与您所期望的不匹配的原因是非贪婪正则表达式在有效时立即停止。你知道吗

所以(.*?)可以匹配'',所以regex立即停止。你知道吗

相反,r'(PCR-\d+):(.*?)\s\r\n'是非贪婪的,但是因为它需要找到`\s\r\n',它将强制扩展工作。你知道吗

我建议使用贪婪的正则表达式,它只包含您希望找到的字符:r'(PCR-\d+):([0-9 ]*)'。你知道吗

模式PCR-09:(.*?)告诉Python在PCR-09:之后不贪婪地匹配零个或多个字符。所以,它确实做到了这一点,并且匹配零个字符。你知道吗

你需要让你的正则表达式是贪婪的,以便使所有的东西都符合新行:

>>> re.search('PCR-09:(.*)', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r',)
>>>

请注意,您的PCR-09:(.*?)\s\r\n模式之所以有效,是因为它告诉Python在PCR-09:\s\r\n之后获取零个或多个字符。换句话说,把他们之间的一切都搞清楚。你知道吗

尝试使用:split

[ x.split(':') for x in z.split('\r\n')]

输出:

[['PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['']]

使用正则表达式

re.findall('(PCR-\d+)(.*)',z)

相关问题 更多 >

    热门问题