使用前面带有非字母字符的空格拆分文本

sample_text = """With this example I wanna make the point clear... I hope you get it! There are many coding languages out there, but which is the best? I would say there's no best. Change my mind - if you can!""" split_text = nltk.tokenize.sent_tokenize(sample_text) print(split_text) #Output: ['With this example I wanna make the point clear...', 'I hope you get it!', 'There are many coding languages out there, but which is the best?', "I would say there's no best.", 'Change my mind - if you can!']

[ 'With this example I wanna make the point clear...', 'I hope you get it!', 'There are many coding languages out there,', 'but which is the best?', "I would say there's no best.", 'Change my mind -', 'if you can!' ]

2条回答

网友

1楼 · 编辑于 2024-10-05 12:24:44

正则表达式工作正常，请尝试在.split（）中使用此表达式 [!"\#$%&'()*+,\-.\/:;<=>?@\[\\\]^_‘{|}~]

网友

2楼 · 编辑于 2024-10-05 12:24:44

可以在前面没有字母的空格上拆分字符串：

split_text = re.split('(?<=[^a-z]) ', sample_text, 0, re.I)
print(split_text)

输出：

[
 'With this example I wanna make the point clear...',
 'I hope you get it!',
 'There are many coding languages out there,',
 'but which is the best?',
 "I would say there's no best.",
 'Change my mind -',
 'if you can!'
]

相关问题更多 >

编程相关推荐

热门问题

热门文章