正则表达式删除字符串分号delim

2024-09-28 16:52:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我是第一次学习正则表达式,遇到了下面的问题,我很难解决

考虑下一段

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec eget eros
libero. Duis ac diam pretium velit volutpat facilisis in vel nibh. In lacinia ; neque 
massa, in consectetur lectus ; faucibus vel. Maecenas ; dapibus leo nec ; elit sagittis 
convallis. Sed at lacus consectetur, eleifend urna tristique, consequat orci. Nullam 
ac orci quis elit pellentesque consectetur quis ac libero. Duis lorem sem, sodales ; ut 
massa sed, porta facilisis ex. Aliquam cursus accumsan ante sed maximus. 

现在我想删除所有由分号字符括起来的文本。唯一的问题是,文本可以跨越多行,如果在匹配分号之前到达句点,则应保留该字符串。例如,以上段落的输出应如下所示:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec eget eros
libero. Duis ac diam pretium velit volutpat facilisis in vel nibh. In lacinia faucibus 
vel. Maecenas elit sagittis convallis. Sed at lacus consectetur, eleifend urna tristique, 
consequat orci. Nullam ac orci quis elit pellentesque consectetur quis ac libero. Duis 
lorem sem, sodales ; ut massa sed, porta facilisis ex. Aliquam cursus accumsan ante sed 
maximus. 

在谷歌搜索了一段时间后,我找到了re.MULTILINE模式,但我认为这不是我需要的。任何帮助都将不胜感激


Tags: insedacipsumloremdolorelitvel
1条回答
网友
1楼 · 发布于 2024-09-28 16:52:33
;[^;.]*;

您只需使用它并用empty string替换即可。请参阅演示

https://regex101.com/r/yX8zV8/3

import re
p = re.compile(r';[^;.]*;')
test_str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec eget eros\nlibero. Duis ac diam pretium velit volutpat facilisis in vel nibh. In lacinia ; neque \nmassa, in consectetur lectus ; faucibus vel. Maecenas ; dapibus leo nec ; elit sagittis \nconvallis. Sed at lacus consectetur, eleifend urna tristique, consequat orci. Nullam \nac orci quis elit pellentesque consectetur quis ac libero. Duis lorem sem, sodales ; ut \nmassa sed, porta facilisis ex. Aliquam cursus accumsan ante sed maximus. "
subst = ""

result = re.sub(p, subst, test_str)

相关问题 更多 >