重复使用相同的前缀以查找下一个匹配项(如果有)

2024-09-29 17:21:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的字符串:

string = '
something .... something else ...
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
other things ...
'

没有任何CR/LF,这一切都在一条线上

我想创建一个正则表达式,它:

  • 当且仅当url以/transfer/packages/开头时
  • 捕获每个后续GUID
  • 直到带引号的字符串的结尾"
  • 要找到的GUID数未知,至少为一个

到目前为止,我写道:

\/transfer\/packages\/[^"]*([A-Za-z0-9]{8}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{4}-[A-Za-z0-9]{12})"

但它只捕获最后一个guid。我需要一些如何重用前缀/transfer/packages/,并保持匹配,每次都急切地扩展搜索,而不离开前缀


Tags: 字符串urlstringpackageselsesomethingtransfercr
3条回答

从这个SOanswer

As for the second question, it is a common problem. It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer. You cannot have more submatches in the resulting array than the number of capturing groups inside the regex pattern. See Repeating a Capturing Group vs. Capturing a Repeated Group for more details.

如果您在Python中使用re模块,那么可能使用str.startwith并尝试:

import re
url="/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../...."
if url.startswith('/transfer/packages/'):
    Guid_List = re.findall(r'(?i)[a-z0-9]{8}(?:-[a-z0-9]{4}){3}-[a-z0-9]{12}', url)
print(Guid_List)

您可以使用PyPi regex module,它在lookback中支持无限长量词:

(?<=url="/transfer/packages/[^\r\n"]*)[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}(?=[^\r\n"]*")

示例Regex demo(为演示目的选择了另一个引擎)或参见Python demo


另一个选项是首先匹配带有url="/transfer/packages/后跟guid的行,然后匹配到下一个双引号

然后您可以使用例如re.findall来获取所有的guid

"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"

Regex demoPython demo

例如:

import re

regex = r'"/transfer/packages/[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}[^"\r\n]*"'
test_str = ("something .... something else ...\n"
    "url=\"/transfer/packages/00000000-0000-0000-0000-000000000000/connectors/68f74d66-ca3d-4272-9b59-4f737946b3f7/something/138bb190-3b12-4855-88e2-0d1cdf46aeb5/...../...../...../...../....\"\n"
    "other things ...\n\n"
    "68f74d66-ca3d-4272-9b59-4f737946b300")

for str in re.findall(regex, test_str):
    print(re.findall(r"[A-Za-z0-9]{8}-(?:[A-Za-z0-9]{4}-){3}[A-Za-z0-9]{12}", str))

输出

['00000000-0000-0000-0000-000000000000', '68f74d66-ca3d-4272-9b59-4f737946b3f7', '138bb190-3b12-4855-88e2-0d1cdf46aeb5']

相关问题 更多 >

    热门问题