<p>从技术上讲,如果没有虚拟根,<code>car</code>和{<cd2>}两个synsets都不会相互链接:</p>
<pre><code>>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
>>> print x.shortest_path_distance(y)
None
>>> print y.shortest_path_distance(x)
None
</code></pre>
<p>现在,让我们仔细看看虚拟根问题。首先,NLTK中有一个简洁的函数,它表示synset是否需要一个伪根:</p>
^{pr2}$
<p>接下来,当您查看<code>path_similarity</code>代码(<a href="http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity" rel="noreferrer">http://nltk.googlecode.com/svn-/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.path_similarity</a>)时,您可以看到:</p>
<pre><code>def path_similarity(self, other, verbose=False, simulate_root=True):
distance = self.shortest_path_distance(other, \
simulate_root=simulate_root and self._needs_root())
if distance is None or distance < 0:
return None
return 1.0 / (distance + 1)
</code></pre>
<p>因此对于<code>automobile</code>synset,当您尝试<code>y.path_similarity(x)</code>时,该参数<code>simulate_root=simulate_root and self._needs_root()</code>将始终是{<cd6>},当您尝试{<cd8>}时,它将始终是{<cd9>},因为{<cd10>}是{<cd9>}:</p>
<pre><code>>>> True and y._needs_root()
True
>>> True and x._needs_root()
False
</code></pre>
<p>现在当<code>path_similarity()</code>向下传递到<code>shortest_path_distance()</code>(<a href="https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance" rel="noreferrer">https://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordnet-pysrc.html#Synset.shortest_path_distance</a>)然后再传递到<code>hypernym_distances()</code>时,它将尝试调用一个超链接词列表来检查它们的距离,如果没有<code>simulate_root = True</code>,<code>automobile</code>语法集将不会连接到{<cd1>},反之亦然:</p>
<pre><code>>>> y.hypernym_distances(simulate_root=True)
set([(Synset('automobile.v.01'), 0), (Synset('*ROOT*'), 2), (Synset('travel.v.01'), 1)])
>>> y.hypernym_distances()
set([(Synset('automobile.v.01'), 0), (Synset('travel.v.01'), 1)])
>>> x.hypernym_distances()
set([(Synset('object.n.01'), 8), (Synset('self-propelled_vehicle.n.01'), 2), (Synset('whole.n.02'), 8), (Synset('artifact.n.01'), 7), (Synset('physical_entity.n.01'), 10), (Synset('entity.n.01'), 11), (Synset('object.n.01'), 9), (Synset('instrumentality.n.03'), 5), (Synset('motor_vehicle.n.01'), 1), (Synset('vehicle.n.01'), 4), (Synset('entity.n.01'), 10), (Synset('physical_entity.n.01'), 9), (Synset('whole.n.02'), 7), (Synset('conveyance.n.03'), 5), (Synset('wheeled_vehicle.n.01'), 3), (Synset('artifact.n.01'), 6), (Synset('car.n.01'), 0), (Synset('container.n.01'), 4), (Synset('instrumentality.n.03'), 6)])
</code></pre>
<p>所以理论上,右<code>path_similarity</code>是0/None,但是由于<code>simulate_root=simulate_root and self._needs_root()</code>参数</p>
<p>NLTK的API中的<strong><code>nltk.corpus.wordnet.path_similarity()</code>不是可交换的。</strong></p>
<p>但是代码也没有错误/错误,因为通过根进行的任何synset距离的比较都将持续很远,因为伪<code>*ROOT*</code>的位置永远不会改变,所以最好的做法是这样做来计算路径相似性:</p>
<pre><code>>>> from nltk.corpus import wordnet as wn
>>> x = wn.synset('car.n.01')
>>> y = wn.synset('automobile.v.01')
# When you NEVER want a non-zero value, since going to
# the *ROOT* will always get you some sort of distance
# from synset x to synset y
>>> max(wn.path_similarity(x,y), wn.path_similarity(y,x))
# when you can allow None in synset similarity comparison
>>> min(wn.path_similarity(x,y), wn.path_similarity(y,x))
</code></pre>