正则表达式,用逗号替换客户评论中的一些点

2024-09-28 17:02:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要编写一个正则表达式,用'.'替换一些患者关于药物的评论中的','。他们应该在提到副作用后用逗号,但有些人用圆点。例如:

text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache.  I suffered. she suffered. she told I should change it."

我编写了一个正则表达式代码来检测一个单词(如头痛)或两个单词(如恶梦)周围的两点:

检测被两点包围的单词:

text=  re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text )

检测被两点包围的两个单词:

text =  re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11 )

这是输出:

the drug side-effects are: night mare, nausea,  night sweat.  bad dream, dizziness,  severe headache.   I suffered, she suffered.  she told I should change it.

但应该是:

the drug side-effects are: night mare, nausea,  night sweat,  bad dream, dizziness,  severe headache.   I suffered. she suffered.  she told I should change it.

我的代码没有替换dot之后的night sweat to ','。我加上,if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered)。我不知道如何将此条件添加到我的代码中。你知道吗

有什么建议吗?谢谢您!你知道吗


Tags: thetextitchange单词sideareeffects
1条回答
网友
1楼 · 发布于 2024-09-28 17:02:16

可以使用以下模式:

\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)

这个匹配一个点,然后捕获一个或两个单词,其中第一个不是你提到的代词(你很可能需要扩展这个列表)。后面必须跟一个既不是单词字符也不是空格的字符(例如.!:,)或字符串的结尾。你知道吗

然后必须用,\1替换它

在python中

import re
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache.  I suffered. she suffered. she told I should change it."
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I)
print(text)

输出

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache.  I suffered. she suffered. she told I should change it.

这可能不是绝对的故障保护,您可能需要为某些边缘情况扩展模式。你知道吗

相关问题 更多 >