我有以下字符串来提取体积(仅匹配ml,而不是mg/ml)
test = [
"10ML", # 10
"10 ML", # 10
"10.5ML", # 10.5
"1MG/1ML", # [] not match
"1MG/10ML", # [] not match
"10MG/0.5ML", # [] not match
" 10ML and 15ML ", # 10, 15
"LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION", # []
"NSS.0.9% 1000 ML (PLASTIC BAG)", # 1000
"110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML", # 10
]
这是我目前的模式和结果
pattern = re.compile("(?<!\/)([0-9]*[.]*[0-9]+)\s*ML(?![\/A-z])")
for i, s in enumerate(test):
print(test[i], '>>' , pattern.findall(s))
10ML >> ['10']
10 ML >> ['10']
10.5ML >> ['10.5']
1MG/1ML >> []
1MG/10ML >> ['0'] # Wrong []
10MG/0.5ML >> ['.5'] # Wrong []
10ML and 15ML >> ['10', '15']
LODEXA (DEXAMETHASONE) 5 MG/ML INJECTION >> []
NSS.0.9% 1000 ML (PLASTIC BAG) >> ['1000']
110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML >> ['10', '30'] # Wrong ['10']
如你所见,
我从["1MG/10ML", "10MG/0.5ML", "110 MLM HIDRASEC (RACECADOTIL)10 ML POWDER FOR 1-30 ML"]
得到了错误的结果。它应该是[[], [], ['10']
我试着修正我的模式,但还是没能弄明白。请帮我纠正我的模式。谢谢大家!
另一个可能更容易阅读:
有关以下正则表达式的组件的详细信息,请参见this RegExr链接
输出
你可以用
见Python regex demo
详细信息:
(?<![/\d])
-当前位置左侧不允许有/
或数字(?<!\d[.-])
-当前位置左侧不允许立即出现数字+.
或-
(\d+(?:\.\d+)?)
-组1:一个或多个数字,以及一个.
和一个或多个数字的可选序列\s*
-零个或多个空格字符ML\b
-ML
作为一个整体(?!/)
-当前位置右侧不允许立即出现{见Python demo:
输出:
相关问题 更多 >
编程相关推荐