我试图用pythonnltk的wordnet来寻找两个单词之间的相似性。两个示例关键字是“game”和“leonardo”。首先,我提取了这两个单词的所有语法集,并对每个语法集进行交叉匹配,以找出它们的相似性。这是我的密码
from nltk.corpus import wordnet as wn
xx = wn.synsets("game")
yy = wn.synsets("leonardo")
for x in xx:
for y in yy:
print x.name
print x.definition
print y.name
print y.definition
print x.wup_similarity(y)
print '\n'
以下是总产量:
game.n.01 a contest with rules to determine a winner leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.285714285714
game.n.02 a single play of a sport or other contest leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.285714285714
game.n.03 an amusement or pastime leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.25
game.n.04 animal hunted for food or sport leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.923076923077
game.n.05 (tennis) a division of play during which one player serves leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.222222222222
game.n.06 (games) the score at a particular point or the score needed to win leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.285714285714
game.n.07 the flesh of wild animals that is used for food leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.5
plot.n.01 a secret scheme to do something (especially something underhand or illegal) leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.2
game.n.09 the game equipment needed in order to play a particular game leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.666666666667
game.n.10 your occupation or line of work leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.25
game.n.11 frivolous or trifling behavior leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) 0.222222222222
bet_on.v.01 place a bet on leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) -1
crippled.s.01 disabled in the feet or legs leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) -1
game.s.02 willing to face danger leonardo.n.01 Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519) -1
但game.n.04和leonardo.n.01之间的相似性真的很奇怪。我认为相似度(0.923076923077)不应该这么高。在
game.n.04
animal hunted for food or sport
leonardo.n.01
Italian painter and sculptor and engineer and scientist and architect; the most versatile genius of the Italian Renaissance (1452-1519)
0.923076923077
我的想法有什么问题吗?在
根据the docs,
wup_similarity()
方法返回。。。在…还有。。。在
…这就是为什么它认为它们是相似的,尽管我得到。。。在
^{pr2}$……因为某些原因不同。在
更新
像这样的东西怎么样。。。在
…哪个指纹。。。在
相关问题 更多 >
编程相关推荐