如何获取所有在ntlk wordnet中共享一个特定的最低通用的上下义词?

2024-09-22 16:34:28 发布

您现在位置:Python中文网/ 问答频道 /正文

如果有一条路径可以从两个常见的词组中找到一个最低的共同的上下义词,那么似乎应该有某种方法来回溯并找到导致该同名词的下义词

from nltk.corpus import wordnet as wn
alaska = wn.synset('Alaska.n.1')
california = wn.synset('California.n.1')
common_hypernym = alaska.lowest_common_hypernyms(california)[0]

common_hypernym
Synset('american_state.n.01')

common_hypernym.do_something_awesome()
['Alabama.n.1', 'Alaska.n.1', ...] #all 50 american states

Tags: 方法from路径commonamerican名词synsetwn
2条回答

较新的解决方案是:

alaska = wordnet.synset('Alaska.n.1')
california = wordnet.synset('California.n.1')
alaska.lowest_common_hypernyms(california)

[Synset('american_state.n.01')]

这个旧函数是私有的,不能以这种方式工作,可能是其他的,但无论如何,您也可以选择x.common.hypernyms(y)来查找所有的公共项。在

使用Synset1._shortest_path_distance(Synset2)来查找其上位词及其距离:

>>> from nltk.corpus import wordnet as wn
>>> alaska = wn.synset('Alaska.n.1')
>>> california = wn.synset('California.n.1')

>>> alaska._shortest_hypernym_paths(california)
{Synset('district.n.01'): 4, Synset('location.n.01'): 6, Synset('region.n.03'): 5, Synset('physical_entity.n.01'): 8, Synset('entity.n.01'): 9, Synset('state.n.01'): 2, Synset('administrative_district.n.01'): 3, Synset('object.n.01'): 7, Synset('alaska.n.01'): 0, Synset('*ROOT*'): 10, Synset('american_state.n.01'): 1}

现在找到最小路径:

^{pr2}$

现在,这很无聊,因为california和{}是WordNet层次结构中的姊妹节点。让我们过滤掉所有姐妹节点:

>>> paths = {k:v for k,v in paths.items() if v > 0}
>>> min(paths, key=paths.get)
Synset('american_state.n.01')

要获取american_state的子节点(我想这就是您需要的“something awesome”了…):

>>> min(paths, key=paths.get).hyponyms()
[Synset('free_state.n.02'), Synset('slave_state.n.01')]
>>> list(min(paths, key=paths.get).closure(lambda s:s.hyponyms()))
[Synset('free_state.n.02'), Synset('slave_state.n.01')]

这看起来可能令人震惊,但实际上,alaskacalifornia没有指定的上下位词:

>>> alaska.hypernyms()
[]
>>> california.hypernyms()
[]

使用_shortest_hypernym_paths建立的连接是通过一个虚拟根来实现的,看一下Is wordnet path similarity commutative?

相关问题 更多 >