import nltk
from nltk.corpus import stopwords
sentence = """At eight o'clock on Thursday morning ... Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
filtered_tokens = [w for w in tokens if not w.lower() in stopwords.words('english')]
print tokens
print filtered_tokens
您要查找的术语称为停止删除单词。在
实现这一点的强大库是NLTK
它可以处理输入文本的更复杂的标记化,轻松删除停止字等:
这将打印:
^{pr2}$相关问题 更多 >
编程相关推荐