正则表达式匹配以句点结尾的段落

Diagnosis of one of the following: A) Neovascular (wet) age-related macular degeneration OR B) Macular edema following retinal vein occlusion, OR C) Diabetic macular edema OR D) Diabetic retinopathy in patients with diabetic macular edema. More text here. PA Criteria Criteria Details Eylea (s) Products Affected  EYLEA Exclusion Criteria Required Medical Information Age Restrictions Prescriber Restrictions Coverage Duration Other Criteria Off Label Uses 12 months Indications All Medically-accepted Indications. Formulary ID 20276, Version 12 101

Diagnosis of one of the following: A) Neovascular (wet) age-related macular degeneration OR B) Macular edema following retinal vein occlusion, OR C) Diabetic macular edema OR D) Diabetic retinopathy in patients with diabetic macular edema.

3条回答

网友

1楼 · 编辑于 2024-09-28 20:17:27

这里有一个不需要任何模块的简单解决方案：

doc = '...'

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.')])

这将产生与原件完全相同的格式

如果您希望它更整洁：

ps = '\n\n'.join([p for p in d.split('\n\n') if not p.endswith('.') and p.strip()])

网友

2楼 · 编辑于 2024-09-28 20:17:27

((.+\n)*(.*\.\n))应该做这个把戏demonstrated here

(.+\n)捕获包含1个或多个字符的行（包括换行符）

(.+\n)*做零次或多次

((.+\n)*(.*\.\n))并且还包括一行零个或多个字符，以句点结束，然后换行

网友

3楼 · 编辑于 2024-09-28 20:17:27

可以使用以下任一正则表达式来完成此操作

选择1

此选项使用re.DOTALL

See regex in use here

(?:\A|\n{2})(?:(?!\n{2}).)+\.(?=\n{2}|\Z)

工作原理：

(?:\A|\n{2})匹配以下任一项：
- \A在字符串的开头断言位置（与^不同，后者在行的开头断言位置）
- \n{2}匹配两个连续的换行符
(?:(?!\n{2}).)+tempered greedy token匹配任何字符，但未能匹配两个连续换行符
\.逐字匹配.
(?=\n{2}|\Z)前瞻匹配以下任一项（断言匹配项后面的内容，但不在结果中包含匹配项）：
- \n{2}匹配两个连续的换行符
- \Z与\A相反-断言字符串末尾的位置（不同于$-断言行末尾的位置）

选择2

此选项比选项1效率更高——使用的步骤减少约22%

See regex in use here

(?:\A|\n{2})(?:.|\n(?!\n))+\.(?=\n{2}|\Z)

它的工作原理（大部分内容与前面相同，因此我只解释区别）：

(?:.|\n(?!\n))+匹配任何字符（除了\n，因为.不匹配换行符），或者\n如果后面没有另一个\n

选择3

这只适用于PCRE或PyPi regex package。这比上述其他选项更有效-比选项2少21%的步骤，比选项1少39%。此正则表达式使用re.DOTALL选项

See regex in use here

(?:\A|\n{2})(?:\n{2}(*SKIP)(*FAIL)|.)+?\.(?=\n{2}|\Z)

工作原理（同样，基本相同，只是解释了区别）：

(?:\n{2}(*SKIP)(*FAIL)|.)+?匹配以下一个或多个次数，但尽可能少（+?-惰性量词）
- \n{2}(*SKIP)(*FAIL)匹配两个连续的换行符，然后使其失败（(*SKIP)(*FAIL)就像魔法一样，防止正则表达式回溯到其当前位置，然后使当前匹配失败。简单地说，这将跳过与(*SKIP)左侧匹配的所有字符（包括\n\n），然后在该位置之后继续模式匹配（有关详细信息，请参见this question）

选择1

选择2

选择3

相关问题更多 >

编程相关推荐

热门问题

热门文章