词网路径相似性是否可交换?

2024-10-01 11:28:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用nltk的wordnetapi。 当我得到另一个值时,比较一下。在

他们不应该给同样的价值吗? 有什么解释或者这是wordnet的一个bug?在

示例:

wn.synset('car.n.01').path_similarity(wn.synset('automobile.v.01')) # None
wn.synset('automobile.v.01').path_similarity(wn.synset('car.n.01')) # 0.06666666666666667

Tags: pathnone示例carbugwordnet价值nltk
2条回答

我不认为这是wordnet本身的缺陷。在您的例子中,automobile被指定为动词,car被指定为名词,因此您需要查看synset以查看图形的外观,并确定网络是否正确标记。在

A = 'car.n.01'
B = 'automobile.v.01'
C = 'automobile.n.01'


wn.synset(A).path_similarity(wn.synset(B)) 
wn.synset(B).path_similarity(wn.synset(A)) 


wn.synset(A).path_similarity(wn.synset(C)) # is 1
wn.synset(C).path_similarity(wn.synset(A)) # is also 1

从技术上讲,如果没有虚拟根,car和{}两个synsets都不会相互链接:

>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
>>> print x.shortest_path_distance(y)
None
>>> print y.shortest_path_distance(x)
None

现在,让我们仔细看看虚拟根问题。首先,NLTK中有一个简洁的函数,它表示synset是否需要一个伪根:

^{pr2}$

接下来,当您查看path_similarity代码(http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity)时,您可以看到:

def path_similarity(self, other, verbose=False, simulate_root=True):
  distance = self.shortest_path_distance(other, \
               simulate_root=simulate_root and self._needs_root())

  if distance is None or distance < 0:
    return None
  return 1.0 / (distance + 1)

因此对于automobilesynset,当您尝试y.path_similarity(x)时,该参数simulate_root=simulate_root and self._needs_root()将始终是{},当您尝试{}时,它将始终是{},因为{}是{}:

>>> True and y._needs_root()
True
>>> True and x._needs_root()
False

现在当path_similarity()向下传递到shortest_path_distance()https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance)然后再传递到hypernym_distances()时,它将尝试调用一个超链接词列表来检查它们的距离,如果没有simulate_root = Trueautomobile语法集将不会连接到{},反之亦然:

>>> y.hypernym_distances(simulate_root=True)
set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)])
>>> y.hypernym_distances()
set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)])
>>> x.hypernym_distances()
set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8), (Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01'), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7), (Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)])

所以理论上,右path_similarity是0/None,但是由于simulate_root=simulate_root and self._needs_root()参数

NLTK的API中的nltk.corpus.wordnet.path_similarity()不是可交换的。

但是代码也没有错误/错误,因为通过根进行的任何synset距离的比较都将持续很远,因为伪*ROOT*的位置永远不会改变,所以最好的做法是这样做来计算路径相似性:

>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')

# When you NEVER want a non-zero value, since going to 
# the *ROOT* will always get you some sort of distance 
# from synset x to synset y
>>> max(wn.path_similarity(x,y), wn.path_similarity(y,x))

# when you can allow None in synset similarity comparison
>>> min(wn.path_similarity(x,y), wn.path_similarity(y,x))

相关问题 更多 >