什么是正则表达式?

2024-09-28 19:05:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Nltk的punkt标记器将一个段落标记为句子,但是在下面的例子中,标记器无法识别句子,因为句点后面跟着数字。我想使用正则表达式来识别这些场景,并用'.1,7,9'替换'. 1,7,9',即在引用和句点之间添加空格。你知道吗

Ex1.  `This is a random sentence.1,7,9 This is a sentence followed by it.`
Ex2. I love football.1,7,24`I also like cricket.

Ex3. ESD for undifferentiated  cancers.[1][7]`Cancers can be treatable.

预期产量:

EX1. This is a random sentence.
     1,7,9 This is a sentence followed by it.
Ex2. I love football.
     ESD for undifferentiated  cancers.1,7
Ex3. ESD for undifferentiated  cancers.1,7
     [1][7]`Cancers can be treatable.

谢谢你。你知道吗


Tags: 标记forbyisitrandomthissentence
2条回答

下面的正则表达式将用.+\n替换后跟非空格字符的所有点

>>> import re
>>> s = "Ex1.  This is a random sentence.1,7,9 This is a sentence followed by it."
>>> print(re.sub(r'\.(\S)', r'.\n\1', s))
Ex1.  This is a random sentence.
1,7,9 This is a sentence followed by it.

DEMO

在附加的整数列表是引用的情况下,将字符return放在整数列表之后可能很有用:

>>> import re
>>> s = "Ex1.  This is a random sentence.1,7,9 This is a sentence followed by it."
>>> print(re.sub(r'(\.\S+\s)', r'\1\n', s))
Ex1.  This is a random sentence.1,7,9 
This is a sentence followed by it.

相关问题 更多 >