正则表达式构造从文本Python中获取句子

2条回答

网友

1楼 · 编辑于 2024-05-19 10:08:32

下面的内容不是每个人都适用的，但它适用于您的特定输入。您可以进一步调整此表达式：

([^!?.]+)[!?.\s]*(?![!?.])

参见regex demo。你知道吗

详细信息：

([^!?.]+)-捕获组1匹配除!、?、.以外的1个或多个字符
[!?.\s]*-0或更多!，?，.，空格
(?![!?.])-后面不跟!、?或.。你知道吗

在Python中，您需要将它与re.findall一起使用，后者只获取捕获组捕获的子字符串：

import re
rx = r"([^!?.]+)[!?.\s]*(?![!?.])"
s = "this is the\nfirst sentence. Isn't\nit? Yes ! !! This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n"
sents = re.findall(rx, s)
print(sents)
# => ['this is the\nfirst sentence', 
      "Isn't\nit", 
      'Yes ', 
      'This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n'
     ]

见Python demo

网友

2楼 · 编辑于 2024-05-19 10:08:32

试试这个：

re.split('(\!\s\!+)|\.|\?',s)
['this is the\nfirst sentence', " Isn't\nit", ' Yes ', ' This \n\nlast bit :) is also a sentence, but \nwithout a terminator other than the end of the file\n']

相关问题更多 >

编程相关推荐

热门问题

热门文章

正则表达式构造从文本Python中获取句子

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >