如何从依赖关系分析器的输出生成树?

2024-07-05 15:02:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从依赖关系分析器的输出生成一个树(嵌套字典)。句子是“我在睡梦中射杀了一头大象”。我可以获得链接上描述的输出: How do I do dependency parsing in NLTK?

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

为了将这个元组列表转换成嵌套字典,我使用了以下链接: How to convert python list of tuples into tree?

^{pr2}$

输出如下:

{'shot': (('ROOT', 'ROOT'),
  {'I': (('nsubj', 'shot'), {}),
   'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
   'sleep': (('nmod', 'shot'),
    {'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}

为了找到根到叶的路径,我使用了以下链接:Return root to specific leaf from a nested dictionary tree

[创建树和查找路径是两件不同的事情]第二个目标是找到根到叶节点的路径,就像done Return root to specific leaf from a nested dictionary tree。 但是我想从根到叶(依赖关系路径) 因此,例如,当我调用recurse_category(categories,'an'),其中categories是嵌套的树结构,“an”是树中的单词,我应该得到ROOT-nsubj-dobj(依赖关系到根)作为输出。在


Tags: toin路径antree字典关系链接
2条回答

这会将输出转换为嵌套字典形式。如果我也能找到路,我会随时通知你的。也许这个,是有帮助的。在

list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]

nodes={}

for i in list_of_tuples:
    rel,parent,child=i
    nodes[child]={'Name':child,'Relationship':rel}

forest=[]

for i in list_of_tuples:
    rel,parent,child=i
    node=nodes[child]

    if parent=='ROOT':# this should be the Root Node
            forest.append(node)
    else:
        parent=nodes[parent]
        if not 'children' in parent:
            parent['children']=[]
        children=parent['children']
        children.append(node)

print forest

输出是一个嵌套字典

[{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]

以下函数可以帮助您找到根到叶的路径:

^{pr2}$

首先,如果您只是为Stanford CoreNLP依赖性解析器使用预先训练的模型,那么应该使用CoreNLPDependencyParserfrom{}并避免使用旧的nltk.parse.stanford接口。在

Stanford Parser and NLTK

在终端下载并运行Java服务器后,在Python中:

>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>

现在我们看到解析的类型是DependencyGraph,来自nltk.parse.dependencygraphhttps://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36

要将DependencyGraph转换为nltk.tree.Tree对象,只需执行DependencyGraph.tree()操作:

^{pr2}$

要将其转换为方括号内的解析格式:

>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)

如果您正在寻找依赖关系三元组:

>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]

>>> for governor, dep, dependent in parses[0].triples():
...     print(governor, dep, dependent)
... 
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')

CONLL格式:

>>> print(parses[0].to_conll(style=10))
1   I   I   PRP PRP _   2   nsubj   _   _
2   shot    shoot   VBD VBD _   0   ROOT    _   _
3   an  a   DT  DT  _   4   det _   _
4   elephant    elephant    NN  NN  _   2   dobj    _   _
5   with    with    IN  IN  _   7   case    _   _
6   a   a   DT  DT  _   7   det _   _
7   banana  banana  NN  NN  _   2   nmod    _   _
8   .   .   .   .   _   2   punct   _   _

相关问题 更多 >