我使用Nltk的punkt标记器将一个段落标记为句子,但是在下面的例子中,标记器无法识别句子,因为句点后面跟着数字。我想使用正则表达式来识别这些场景,并用'.1,7,9'
替换'. 1,7,9'
,即在引用和句点之间添加空格。你知道吗
Ex1. `This is a random sentence.1,7,9 This is a sentence followed by it.`
Ex2. I love football.1,7,24`I also like cricket.
Ex3. ESD for undifferentiated cancers.[1][7]`Cancers can be treatable.
预期产量:
EX1. This is a random sentence.
1,7,9 This is a sentence followed by it.
Ex2. I love football.
ESD for undifferentiated cancers.1,7
Ex3. ESD for undifferentiated cancers.1,7
[1][7]`Cancers can be treatable.
谢谢你。你知道吗
下面的正则表达式将用
.
+\n
替换后跟非空格字符的所有点DEMO
在附加的整数列表是引用的情况下,将字符return放在整数列表之后可能很有用:
相关问题 更多 >
编程相关推荐