了解python regexp - 问答 - Python中文网

了解python regexp

2024-09-29 03:28:13 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

假设我有以下字符串：

out = "someUndefinedGarbageVALUE: 12 34 23 00possiblySomeOtherGarbage"

现在我要解析“12342300”值。在这种情况下，我将执行以下操作：

regex = re.compile('VALUE: (\d\d\s?)*')
matches = regex.findall(out)

但是在这种情况下，我只能得到：

当我稍微升级正则表达式时：

regex = re.compile('VALUE: ((\d\d\s?)*)')

我会得到：

12 34 23 00, 00

我的问题：

1）对于http://regexpal.com/，我看到第一个表达式工作得很好。你自己试试：

VALUE: (\d\d\s?)*

反对

garbageVALUE: 05 03 04garbage

与Python不同。我的推理哪里错了？你知道吗

2）为什么第二个表达式正好抓住两个组？它应该只抓住一个吗

12 34 23 00

或者所有可能的变化？你知道吗

12, 12\s, 12\s34 ...

我知道这是一个贪婪的搜索，但为什么正好两组被抓住？你知道吗

Tags：字符串 re com http value 表达式情况 out

1条回答

网友

1楼 · 发布于 2024-09-29 03:28:13

这种差异是由^{}引起的。从文档中：

If one or more groups are present in the pattern, return a list of groups

这就解释了为什么会得到00：这就是组(\d\d\s?)最后匹配的。你知道吗

以及：

this will be a list of tuples if the pattern has more than one group

((\d\d\s?)*)包含两个组，因此findall返回('12 34 23 00', '00')。你知道吗

您可以改用^{}。你知道吗

>>> print [match.group() for match in re.finditer('VALUE: (\d\d\s?)*', out)]
['VALUE: 12 34 23 00']

相关问题更多 >

编程相关推荐

热门问题

热门文章