Python正则表达式:从字符串中提取卷(mL)

2024-09-30 06:13:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下字符串来提取体积(仅匹配ml,而不是mg/ml)

test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
"   10ML and 15ML  ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]

这是我目前的模式和结果

pattern = re.compile("(?<!\/)([0-9]*[.]*[0-9]+)\s*ML(?![\/A-z])")

for i, s in enumerate(test):
    print(test[i], '>>' , pattern.findall(s))

10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> ['0'] # Wrong []
10MG/0.5ML >> ['.5'] # Wrong []
   10ML and 15ML   >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10', '30'] # Wrong ['10']

如你所见, 我从["1MG/10ML", "10MG/0.5ML", "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]得到了错误的结果。它应该是[[], [], ['10']

我试着修正我的模式,但还是没能弄明白。请帮我纠正我的模式。谢谢大家!


Tags: andtestformatch模式notmlwrong
3条回答

另一个可能更容易阅读:

(?<![/\d-])(\d+\.*\d+)\s*ML\b

有关以下正则表达式的组件的详细信息,请参见this RegExr链接

import re

test = [
    "10ML", # 10
    "10 ML", # 10
    "10.5ML", # 10.5
    "1MG/1ML", # [] not match
    "1MG/10ML", # [] not match
    "10MG/0.5ML", # [] not match
    "   10ML and 15ML  ", # 10, 15
    "LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
    "NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
    "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]

for s in test:
    re.findall(r'(?<![\-\/])(\d+(?:\.?\d+)) *ML\b', s)

输出

['10']
['10']
['10.5']
[]
[]
[]
['10', '15']
[]
['1000']
['10']

你可以用

(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)

Python regex demo

详细信息

  • (?<![/\d])-当前位置左侧不允许有/或数字
  • (?<!\d[.-])-当前位置左侧不允许立即出现数字+.-
  • (\d+(?:\.\d+)?)-组1:一个或多个数字,以及一个.和一个或多个数字的可选序列
  • \s*-零个或多个空格字符
  • ML\b-ML作为一个整体
  • (?!/)-当前位置右侧不允许立即出现{}

Python demo

import re
pattern = re.compile(r'(?<![/\d])(?<!\d[.-])(\d+(?:\.\d+)?)\s*ML\b(?!/)', re.A)
test = ["10ML", "10 ML", "10.5ML", "1MG/1ML", "1MG/10ML", "10MG/0.5ML", "   10ML and 15ML  ",
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", "NSS.0.9% 1000 ML (PLASTIC BAG)", 
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
for i, s in enumerate(test):
    print(test[i], '>>' , pattern.findall(s))

输出:

10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> []
10MG/0.5ML >> []
   10ML and 15ML   >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10']

相关问题 更多 >

    热门问题