<p>从“show me an example”的角度来看,下面是一个示例,演示如何使用语义相似性来执行WSD:</p>
<pre><code>from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize
def max_wupa(context_sentence, ambiguous_word):
"""
WSD by Maximizing Wu-Palmer Similarity.
Perform WSD by maximizing the sum of maximum Wu-Palmer score between possible
synsets of all words in the context sentence and the possible synsets of the
ambiguous words (see http://goo.gl/XMq2BI):
{argmax}_{synset(a)}(\sum_{i}^{n}{{max}_{synset(i)}(Wu-Palmer(i,a))}
Wu-Palmer (1994) similarity is based on path length; the similarity between
two synsets accounts for the number of nodes along the shortest path between
them. (see http://acl.ldc.upenn.edu/P/P94/P94-1019.pdf)
"""
result = {}
for i in wn.synsets(ambiguous_word):
result[i] = sum(max([i.wup_similarity(k) for k in wn.synsets(j)]+[0]) \
for j in word_tokenize(context_sentence))
result = sorted([(v,k) for k,v in result.items()],reverse=True)
return result
bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']
ans = max_wupa(bank_sents[0], 'bank')
print ans
print ans[0][1].definition
</code></pre>
<p>(来源:<a href="https://github.com/alvations/pywsd" rel="nofollow">pyWSD @ github</a>)</p>
<p>请谨慎使用上述代码,因为您需要考虑:</p>
<ol>
<li>当我们试图最大化上下文句子中所有标记的所有可能语法集和歧义词的可能语法集之间的路径相似性时,到底发生了什么?在</li>
<li>如果大多数路径相似性产生<code>None</code>,并且偶然得到一些与歧义词的一个synset相关联的流氓词,那么最大化是否合乎逻辑?在</li>
</ol>