基于lab的NLTK子树分离

2024-09-26 22:12:27 发布

您现在位置：Python中文网/ 问答频道 /正文

6181

网友

男 | 程序猿一只，喜欢编程写python代码。

我有一个NLTK解析树，我只想基于“s”标签来分离树的叶子。请注意，S不应与叶子重叠

他赢得了马拉松比赛，在30分钟内结束

corenlp的树形是

tree = '(S
  (NP (PRP He))
  (VP
    (VBD won)
    (NP (DT the) (NNP Gusher) (NNP Marathon))
    (, ,)
    (S (VP (VBG finishing) (PP (IN in) (NP (CD 30) (NNS minutes))))))
  (. .))'

这个想法是提取两个“S”和它们的叶子，但不要相互重叠。所以预期的结果应该是“他赢得了喷泉马拉松” “30分钟内完成”

# Tree manipulation

# Extract phrases from a parsed (chunked) tree
# Phrase = tag for the string phrase (sub-tree) to extract
# Returns: List of deep copies;  Recursive
def ExtractPhrases( myTree, phrase):
    myPhrases = []
    if (myTree.label() == phrase):
        myPhrases.append( myTree.copy(True) )
    for child in myTree:
        if (type(child) is Tree):
            list_of_phrases = ExtractPhrases(child, phrase)
            if (len(list_of_phrases) > 0):
                myPhrases.extend(list_of_phrases)
    return myPhrases

subtexts = set()
sep_tree = ExtractPhrases( Tree.fromstring(tree), 'S')
for sep in sep_tree:
    for subtree in sep.subtrees():
        if subtree.label()=="S":
            print(subtree)
            subtexts.add(' '.join(subtree.leaves()))
            #break

subtexts = list(subtexts)
print(subtexts)

我得到了结果

['He won the Gusher Marathon , finishing in 30 minutes .', 'finishing in 30 minutes']

我不想在字符串级操作它，而是在树级操作它，因此预期的输出将是-

["He won the Gusher Marathon ,.",  "finishing in 30 minutes."]

Tags： of the in tree for if sep list

0条回答

目前没有回答

基于lab的NLTK子树分离

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于lab的NLTK子树分离

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >