正则表达式2组匹配所有模式

Tortillas Bolsa 2a 1kg 4118 Tortillinas 50p 1 31Kg TAB TR 46113 Bollos BK 4in 36p 1635g SL 131 Super Pan Bco Ajonjoli 680g SP WON 100 Pan Blanco Bimbo Rendidor 567g BIM 49973 Gansito ME 5p 250g MTA MLA 49860

3条回答

网友

1楼 · 编辑于 2024-10-01 07:41:45

使用lookahead（解释如下）预编译regex模式，并在列表中使用regex.match：

>>> import re
>>> p = re.compile(r'\D+?(?=\s*([A-Z]{2})?\s*\d)')
>>> [p.match(x).group() for x in data]

[
 'Tortillas Bolsa',
 'Tortillinas',
 'Bollos',
 'Super Pan Bco Ajonjoli',
 'Pan Blanco Bimbo Rendidor',
 'Gansito'
]

这里，data是字符串列表。你知道吗

细节

\D+?            # anything that isn't a digit (non-greedy)
(?=             # regex-lookahead
\s*             # zero or more wsp chars
([A-Z]{2})?     # two optional uppercase letters
\s*   
\d              # digit
)

如果任何字符串不包含您要查找的模式，列表理解将出错（带有AttributeError），因为在该实例中re.match返回None。然后可以使用循环并在提取匹配部分之前测试re.match的值。你知道吗

matches = []
for x in data:
    m = p.match(x)
    if m:
        matches.append(m.group())

或者，如果不匹配时需要占位符None：

matches = []
for x in data:
    matches.append(m.group() if m else None)

网友

2楼 · 编辑于 2024-10-01 07:41:45

我的2美分

^.*?(?=\s[\d]|\s[A-Z]{2,})

https://regex101.com/r/7xD7DS/1/

网友

3楼 · 编辑于 2024-10-01 07:41:45

您可以使用前瞻功能：

I_WANT        = '(.+?)' # This is what you want
I_DO_NOT_WANT = '\s(?:[0-9]|(?:[A-Z]{2,3}\s))' # Stop-patterns
RE = '{}(?={})'.format(I_WANT, I_DO_NOT_WANT) # Combine the parts

[re.findall(RE, x)[0] for x in test_strings]
#['Tortillas Bolsa', 'Tortillinas', 'Bollos', 'Super Pan Bco Ajonjoli',
# 'Pan Blanco Bimbo Rendidor', 'Gansito']

相关问题更多 >

编程相关推荐

热门问题

热门文章