如何阻止Shakespere/KJV使用nltk.stem.雪地

2024-05-19 06:47:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我想阻止早期现代英语文本:

sb.stem("loveth")
>>> "lov"

显然,我要做的就是a small tweak到雪球茎杆:

And to put the endings into the English stemmer, the list

ed edly ing ingly

步骤1b的

ed edly ing ingly est eth

就雪球脚本而言,结尾“est”“eth”必须与结尾“ing”相对应。在

太好了,所以我只需要改变变量。或许可以添加一个特别的规则来处理“你”/“你”/“你”和“应该”/“应该”。NLTK documentation将变量显示为:

class nltk.stem.snowball.EnglishStemmer(ignore_stopwords=False)

Bases: nltk.stem.snowball._StandardStemmer

The English Snowball stemmer.

Variables:

__vowels – The English vowels.

__double_consonants – The English double consonants.

__li_ending – Letters that may directly appear before a word final ‘li’.

__step0_suffixes – Suffixes to be deleted in step 0 of the algorithm.

__step1a_suffixes – Suffixes to be deleted in step 1a of the algorithm.

__step1b_suffixes – Suffixes to be deleted in step 1b of the algorithm. (Here we go)

__step2_suffixes – Suffixes to be deleted in step 2 of the algorithm.

__step3_suffixes – Suffixes to be deleted in step 3 of the algorithm.

__step4_suffixes – Suffixes to be deleted in step 4 of the algorithm.

__step5_suffixes – Suffixes to be deleted in step 5 of the algorithm.

__special_words – A dictionary containing words which have to be stemmed specially. (I can stick my "thee"/"thou" and "shalt" issues here)

现在,愚蠢的问题。如何更改变量?我到处寻找变量,总是得到“对象没有属性”。。。在


Tags: ofthetoinenglishstepbealgorithm
1条回答
网友
1楼 · 发布于 2024-05-19 06:47:03

尝试:

>>> from nltk.stem import snowball
>>> stemmer = snowball.EnglishStemmer()
>>> stemmer.stem('thee')
u'thee'
>>> dir(stemmer)
['_EnglishStemmer__double_consonants', '_EnglishStemmer__li_ending', '_EnglishStemmer__special_words', '_EnglishStemmer__step0_suffixes', '_EnglishStemmer__step1a_suffixes', '_EnglishStemmer__step1b_suffixes', '_EnglishStemmer__step2_suffixes', '_EnglishStemmer__step3_suffixes', '_EnglishStemmer__step4_suffixes', '_EnglishStemmer__step5_suffixes', '_EnglishStemmer__vowels', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_r1r2_standard', '_rv_standard', 'stem', 'stopwords', 'unicode_repr']
>>> stemmer._EnglishStemmer__special_words
{u'exceeds': u'exceed', u'inning': u'inning', u'exceed': u'exceed', u'exceeding': u'exceed', u'succeeds': u'succeed', u'succeeded': u'succeed', u'skis': u'ski', u'gently': u'gentl', u'singly': u'singl', u'cannings': u'canning', u'early': u'earli', u'earring': u'earring', u'bias': u'bias', u'tying': u'tie', u'exceeded': u'exceed', u'news': u'news', u'herring': u'herring', u'proceeds': u'proceed', u'succeeding': u'succeed', u'innings': u'inning', u'proceeded': u'proceed', u'proceed': u'proceed', u'dying': u'die', u'outing': u'outing', u'sky': u'sky', u'andes': u'andes', u'idly': u'idl', u'outings': u'outing', u'ugly': u'ugli', u'only': u'onli', u'proceeding': u'proceed', u'lying': u'lie', u'howe': u'howe', u'atlas': u'atlas', u'earrings': u'earring', u'cosmos': u'cosmos', u'canning': u'canning', u'succeed': u'succeed', u'herrings': u'herring', u'skies': u'sky'}
>>> stemmer._EnglishStemmer__special_words['thee'] = 'thou'
>>> stemmer.stem('thee')
'thou'

以及:

^{pr2}$

请注意,步骤后缀是元组,是不可变的,因此您不能像特殊单词那样附加或添加到它们,您必须“复制”并强制转换到list并附加到它,然后覆盖它,例如:

>>> from nltk.stem import snowball
>>> stemmer = snowball.EnglishStemmer()
>>> stemmer._EnglishStemmer__step1b_suffixes
[u'eedly', u'ingly', u'edly', u'eed', u'ing', u'ed', 'eth']
>>> step1b = stemmer._EnglishStemmer__step1b_suffixes 
>>> stemmer._EnglishStemmer__step1b_suffixes = list(step1b) + ['eth']
>>> stemmer.stem('loveth')
u'love'

相关问题 更多 >

    热门问题