正在分析python ascitriee输出并打印“带标签的括号表示法”

1条回答

网友

1楼 · 发布于 2024-09-28 23:24:54

不用解析树，您可以让SyntaxNet以更容易解析的conll格式输出所有内容。句子的conll格式如下：

1       Alice   _       NOUN    NNP     _       10      nsubj   _       _
2       ,       _       .       ,       _       1       punct   _       _
3       who     _       PRON    WP      _       6       nsubj   _       _
4       had     _       VERB    VBD     _       6       aux     _       _
5       been    _       VERB    VBN     _       6       aux     _       _
6       reading _       VERB    VBG     _       1       rcmod   _       _
7       about   _       ADP     IN      _       6       prep    _       _
8       SyntaxNet       _       NOUN    NNP     _       7       pobj    _       _
9       ,       _       .       ,       _       10      punct   _       _
10      saw     _       VERB    VBD     _       0       ROOT    _       _
11      Bob     _       NOUN    NNP     _       10      dobj    _       _
12      in      _       ADP     IN      _       10      prep    _       _
13      the     _       DET     DT      _       14      det     _       _
14      hallway _       NOUN    NN      _       12      pobj    _       _
15      yesterday       _       NOUN    NN      _       10      tmod    _       _
16      .       _       .       .       _       10      punct   _       _

每列的含义可以找到here。目前我们只关心第一列（单词的ID）、第二列（单词本身）和第七列（head，换句话说，父列）。根节点的父节点为0。在

为了得到conll格式，我们只需注释掉最后几行演示.sh（我想你以前经常得到你的输出）：

^{pr2}$

（别忘了注释掉前一行的反斜杠）

（where I got this trick from, see the comment）

当我跑的时候演示.sh我自己得到了很多我不需要的信息。你怎样才能摆脱我留给你的思考（让我知道：）。现在，我将相关的部分保存到一个文件中，这样我就可以将它导入我将要编写的python程序中。如果你能把这些信息处理掉，你应该可以用管道演示.sh直接进入python程序。在

注意：我对python还比较陌生，所以请随时改进我的代码。在

首先，我们只想从输入中读取conll文件。我喜欢把每个词都放在一个好的班级里。在

#!/usr/bin/env python

import sys

# Conll data format:
# http://ilk.uvt.nl/conll/#dataformat
#
# The only parts we need:
# 1: ID
# 2: FORM (The original word)
# 7: HEAD (The ID of its parent)

class Word:
    "A class containing the information of a single line from a conll file."

    def __init__(self, columns):
        self.id = int(columns[0])
        self.form = columns[1]
        self.head = int(columns[6])
        self.children = []

# Read the conll input and put it in a list of words.
words = []
for line in sys.stdin:
    # Remove newline character, split on spaces and remove empty columns.
    line = filter(None, line.rstrip().split(" "))

    words.append(Word(line))

不错，但还不是树形结构。我们得多做点工作。在

我可以把整张单子翻几遍，以便查找每个孩子的每一个单词，但这样做效率很低。我按它们的父对象对它们进行排序，然后它应该只是一个快速查找，以获取给定父对象的每个子对象。在

# Sort the words by their head (parent).
lookup = [[] for _ in range(len(words) + 1)]
for word in words:
    lookup[word.head].append(word)

创建树结构：

# Build a tree
def buildTree(head):
    "Find the children for the given head in the lookup, recursively"

    # Get all the children of this parent.
    children = lookup[head]

    # Get the children of the children.
    for child in children:
        child.children = buildTree(child.id)

    return children

# Get the root's child. There should only be one child. The function returns an
# array of children so just get the first one.
tree = buildTree(0)[0] # Start with head = 0 (which is the ROOT node)

为了能够以新格式打印树，可以向Word类添加一些方法重载：

def __str__(self):
    if len(self.children) == 0:
        return "[" + self.form + "]"
    else:
        return "[" + self.form + " " + "".join(str(child) for child in self.children) + "]"

def __repr__(self):
    return self.__str__()

现在您可以这样做：

print tree

然后像这样用管子吹：

cat input.conll | ./my_parser.py

或直接来自syntaxnet：

 echo "Alice, who had been reading about SyntaxNet, saw Bob in the hallway yesterday." | syntaxnet/demo.sh | ./my_parser.py

相关问题更多 >

编程相关推荐

热门问题

热门文章