中的密钥对值提取

(NE Stallone/NNP) ('jason', 'NN') ("'s", 'POS') ('film', 'NN') (NE Rocky/NNP) ('was', 'VBD') ('inducted', 'VBN') ('into', 'IN') ('the', 'DT') (NE National/NNP Film/NNP Registry/NNP) ('as', 'IN') ('well', 'RB') ('as', 'IN') ('having', 'VBG') ('its', 'PRP$') ('film', 'NN') ('props', 'NNS') ('placed', 'VBN') ('in', 'IN') ('the', 'DT') (NE Smithsonian/NNP Museum/NNP) ('.', '.')

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum." tokenized = nltk.word_tokenize(text) tagged = nltk.pos_tag(tokenized) namedEnt = nltk.ne_chunk(tagged, binary = True) print namedEnt np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"] for x in namedEnt: if x[0] == 'NN': print x[1]

1条回答

网友

1楼 · 发布于 2024-09-27 21:35:02

似乎您必须在键/值查找中进行一个小的交换。此外，还必须考虑tuple有一个try/except值的情况。下面是一个允许您从树中检索所需值的小方法：

def values_for(tree, tag):
    ret = []
    for x in tree:
        try:
            if x[1] == tag:
                ret.append(x[0])
        except IndexError, e:
            pass
    return ret

然后您应该能够筛选所需的节点：

>>> text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
>>> tokenized = nltk.word_tokenize(text)
>>> tagged = nltk.pos_tag(tokenized)
>>> namedEnt = nltk.ne_chunk(tagged, binary = True)
>>> values_for(namedEnt, 'NN')
['jason', 'film', 'film']
>>> values_for(namedEnt, 'VBN')
['inducted', 'placed']
>>> values_for(namedEnt, 'NNP')
[]
>>> values_for(namedEnt, 'NNS')
['props']

希望这有帮助。干杯！你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章