python nltk返回wordnet相似性度量的奇数结果

2024-09-19 23:44:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用pythonnltk的wordnet来寻找两个单词之间的相似性。两个示例关键字是“game”和“leonardo”。首先,我提取了这两个单词的所有语法集,并对每个语法集进行交叉匹配,以找出它们的相似性。这是我的密码

from nltk.corpus import wordnet as wn

xx = wn.synsets("game")
yy = wn.synsets("leonardo")
for x in xx:
    for y in yy:
        print x.name
        print x.definition
        print y.name
        print y.definition
        print x.wup_similarity(y)
        print '\n'

以下是总产量:

game.n.01 a contest with rules to determine a winner leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.285714285714

game.n.02 a single play of a sport or other contest leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.285714285714

game.n.03 an amusement or pastime leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.25

game.n.04 animal hunted for food or sport leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.923076923077

game.n.05 (tennis) a division of play during which one player serves leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.222222222222

game.n.06 (games) the score at a particular point or the score needed to win leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.285714285714

game.n.07 the flesh of wild animals that is used for food leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.5

plot.n.01 a secret scheme to do something (especially something underhand or illegal) leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.2

game.n.09 the game equipment needed in order to play a particular game leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.666666666667

game.n.10 your occupation or line of work leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.25

game.n.11 frivolous or trifling behavior leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.222222222222

bet_on.v.01 place a bet on leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) -1

crippled.s.01 disabled in the feet or legs leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) -1

game.s.02 willing to face danger leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) -1

但game.n.04和leonardo.n.01之间的相似性真的很奇怪。我认为相似度(0.923076923077)不应该这么高。在

game.n.04

animal hunted for food or sport

leonardo.n.01

Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519)

0.923076923077

我的想法有什么问题吗?在


Tags: andofthegamemostleonardoarchitectpainter
1条回答
网友
1楼 · 发布于 2024-09-19 23:44:36

根据the docswup_similarity()方法返回。。。在

...a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node).

…还有。。。在

>>> from nltk.corpus import wordnet as wn
>>> game = wn.synset('game.n.04')
>>> leonardo = wn.synset('leonardo.n.01')
>>> game.lowest_common_hypernyms(leonardo)
[Synset('organism.n.01')]
>>> organism = game.lowest_common_hypernyms(leonardo)[0]
>>> game.shortest_path_distance(organism)
2
>>> leonardo.shortest_path_distance(organism)
3

…这就是为什么它认为它们是相似的,尽管我得到。。。在

^{pr2}$

……因为某些原因不同。在


更新

I want some measurement which will show that dissimilarity('game', 'chess') is much much less than dissimilarity('game', 'leonardo')

像这样的东西怎么样。。。在

from nltk.corpus import wordnet as wn
from itertools import product

def compare(word1, word2):
    ss1 = wn.synsets(word1)
    ss2 = wn.synsets(word2)
    return max(s1.path_similarity(s2) for (s1, s2) in product(ss1, ss2))

for word1, word2 in (('game', 'leonardo'), ('game', 'chess')):
    print "Path similarity of %-10s and %-10s is %.2f" % (word1,
                                                          word2,
                                                          compare(word1, word2))

…哪个指纹。。。在

Path similarity of game       and leonardo   is 0.17
Path similarity of game       and chess      is 0.25

相关问题 更多 >