擅长:python、mysql、java
<p>在解析情况下,<code>split()</code>通常在您想要丢弃正在拆分的数据时效果最好。但您希望保留它,因此使用捕获方法可能会更好</p>
<pre><code>import re
orig_vals = [
'Champiñón 200 g',
'Zapallo italiano Unid.',
'Bolsa de zanahoria 1 kg',
'Papa malla 2 Kg',
'Palta Hass granel',
'Limón malla 1 kg',
'Tomate granel',
'Brócoli 1 un.',
'Tomate unid',
]
# We will capture the two parts of interest and
# only throw away a space in the middle. This regex is
# not super robust, but it does work correctly for the
# example data you have supplied.
rgx = re.compile('(.+) ((\d|unid).*)', re.IGNORECASE)
new_vals = []
for ov in orig_vals:
m = rgx.search(ov)
new_vals.extend([m.group(1).rstrip(), m.group(2)] if m else [ov])
</code></pre>
<p>如果你真的想使用拆分,你可以编写一个更复杂的正则表达式,使用前瞻来防止消耗,从而丢弃我们正在拆分的文本</p>
<pre><code>rgx2 = re.compile('(.+?) +(?=\d|unid)', re.IGNORECASE)
new_vals2 = [
part
for ov in orig_vals
for part in rgx2.split(ov)
if part
]
</code></pre>