<h2>TL;DR</h2>
<pre><code>import re
from nltk.corpus import wordnet as wn
sense_key_regex = r"(.*)\%(.*):(.*):(.*):(.*):(.*)"
synset_types = {1:'n', 2:'v', 3:'a', 4:'r', 5:'s'}
def synset_from_sense_key(sense_key):
lemma, ss_type, lex_num, lex_id, head_word, head_id = re.match(sense_key_regex, sense_key).groups()
ss_idx = '.'.join([lemma, synset_types[int(ss_type)], lex_id])
return wn.synset(ss_idx)
x = "long%3:00:02::"
synset_from_sense_key(x)
</code></pre>
<h2>很长时间内</h2>
<p>NLTK中有一个非常迟钝的函数。但是,它不是从检测键读取的,而是从<code>data_file_map</code>读取的(例如”数据调整", "数据.名词“,等等):<a href="https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1355" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1355</a></p>
<p>因为在NTLK中我们已经有了一个完全可以理解的API,还有来自<a href="https://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html" rel="nofollow noreferrer">https://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html</a>的一些指南</p>
^{pr2}$
<p>我们可以使用正则表达式<a href="https://regex101.com/r/9KlVK7/1/" rel="nofollow noreferrer">https://regex101.com/r/9KlVK7/1/</a>:</p>
<pre><code>>>> import re
>>> sense_key_regex = r"(.*)\%(.*):(.*):(.*):(.*):(.*)"
>>> x = "long%3:00:02::"
>>> re.match(sense_key_regex, x)
<_sre.SRE_Match object at 0x10061ad78>
>>> re.match(sense_key_regex, x).groups()
('long', '3', '00', '02', '', '')
>>> lemma, ss_type, lex_num, lex_id, head_word, head_id = re.match(sense_key_regex, x).groups()
>>> synset_types = {1:'n', 2:'v', 3:'a', 4:'r', 5:'s'}
>>> '.'.join([lemma, synset_types[int(ss_type)], lex_id])
'long.a.02'
</code></pre>
<p>你可以从sense key=得到NLTK<code>Synset()</code>对象)</p>
<pre><code>>>> from nltk.corpus import wordnet as wn
>>> wn.synset(idx)
Synset('long.a.02')
</code></pre>