我试图创建一个函数,将字符串/文本作为参数,返回文本中的句子列表。像(.,?,!)
这样的句子边界不应该被删除
我不希望它在缩写词(Dr.
{"Dr. Jones"
)。
我应该编一本所有缩略语的词典吗
给定输入:
input = "I think Dr. Jones is busy now. Can you visit some other day? I was really surprised!"
预期输出:
output=['I think Dr. Jones is busy now.','Can you visit some other day?','I was really surprised!']
我所尝试的:
# performing somthing like this:
output = input.split('.')
# will produce
'''
['I think Dr', ' Jones is busy now', ' Can you visit some other day? I was really surprised!']
'''
# where as doing
output = input.split(' ')
# will produce
'''
['I', 'think', 'Dr.', 'Jones', 'is', 'busy', 'now.', 'Can', 'you', 'visit', 'some', 'other', 'day?', 'I', 'was', 'really', 'surprised!']
'''
基本假设是文本输入没有异常标点
实现这一目标的笨拙方法如下:
应产生:
相关问题 更多 >
编程相关推荐