我想阻止早期现代英语文本:
sb.stem("loveth")
>>> "lov"
显然,我要做的就是a small tweak到雪球茎杆:
And to put the endings into the English stemmer, the list
ed edly ing ingly
步骤1b的
ed edly ing ingly est eth
就雪球脚本而言,结尾“est”“eth”必须与结尾“ing”相对应。在
太好了,所以我只需要改变变量。或许可以添加一个特别的规则来处理“你”/“你”/“你”和“应该”/“应该”。NLTK documentation将变量显示为:
class nltk.stem.snowball.EnglishStemmer(ignore_stopwords=False)
Bases: nltk.stem.snowball._StandardStemmer
The English Snowball stemmer.
Variables:
__vowels – The English vowels.
__double_consonants – The English double consonants.
__li_ending – Letters that may directly appear before a word final ‘li’.
__step0_suffixes – Suffixes to be deleted in step 0 of the algorithm.
__step1a_suffixes – Suffixes to be deleted in step 1a of the algorithm.
__step1b_suffixes – Suffixes to be deleted in step 1b of the algorithm. (Here we go)
__step2_suffixes – Suffixes to be deleted in step 2 of the algorithm.
__step3_suffixes – Suffixes to be deleted in step 3 of the algorithm.
__step4_suffixes – Suffixes to be deleted in step 4 of the algorithm.
__step5_suffixes – Suffixes to be deleted in step 5 of the algorithm.
__special_words – A dictionary containing words which have to be stemmed specially. (I can stick my "thee"/"thou" and "shalt" issues here)
现在,愚蠢的问题。如何更改变量?我到处寻找变量,总是得到“对象没有属性”。。。在
尝试:
以及:
^{pr2}$请注意,步骤后缀是元组,是不可变的,因此您不能像特殊单词那样附加或添加到它们,您必须“复制”并强制转换到list并附加到它,然后覆盖它,例如:
相关问题 更多 >
编程相关推荐