关键词抽取：复数/单数/过去时/ing形式的同一单词

1条回答

网友

1楼 · 发布于 2024-06-28 20:23:29

在对文本进行任何操作之前，您需要对其进行词干和修饰（同时，删除停止词和标点符号）。NLTK有内置的lemmatizers和词干分析器，您可以使用：

用于填塞：

import nltk

from nltk.stem import PorterStemmer

porter = PorterStemmer()

print(porter.stem("cats"))  #  =>  cat
print(porter.stem("trouble"))  #  =>  troubl
print(porter.stem("troubling"))  #  =>  troubl
print(porter.stem("troubled"))  #  =>  troubl

From DataCamp:
"Stemming is the process of reducing inflection in words to their root forms such as mapping a group of words to the same stem even if the stem itself is not a valid word in the Language."

对于柠檬化：

from nltk.stem import WordNetLemmatizer

wordnet_lemmatizer = WordNetLemmatizer()

wordnet_lemmatizer.lemmatize("has")  #  =>  has
wordnet_lemmatizer.lemmatize("was")  #  =>  wa

From DataCamp:
"Lemmatization, unlike Stemming, reduces the inflected words properly ensuring that the root word belongs to the language. In Lemmatization root word is called Lemma. A lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words."

您可以在this article中阅读更多关于Python-NLTK词干分析和柠檬化的内容。你知道吗

用于填塞：

对于柠檬化：

相关问题更多 >

编程相关推荐

热门问题

热门文章

关键词抽取：复数/单数/过去时/ing形式的同一单词

用于填塞：

对于柠檬化：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >