中的密钥对值提取

2024-09-27 21:35:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我从nltk获取树结构,在访问树值时,我得到如下结果:

(NE Stallone/NNP)
('jason', 'NN')
("'s", 'POS')
('film', 'NN')
(NE Rocky/NNP)
('was', 'VBD')
('inducted', 'VBN')
('into', 'IN')
('the', 'DT')
(NE National/NNP Film/NNP Registry/NNP)
('as', 'IN')
('well', 'RB')
('as', 'IN')
('having', 'VBG')
('its', 'PRP$')
('film', 'NN')
('props', 'NNS')
('placed', 'VBN')
('in', 'IN')
('the', 'DT')
(NE Smithsonian/NNP Museum/NNP)
('.', '.')

如何仅检索NNVBN的值?你知道吗

我试着这样做:

text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."

tokenized = nltk.word_tokenize(text)
tagged = nltk.pos_tag(tokenized)
namedEnt = nltk.ne_chunk(tagged, binary = True)
print namedEnt
np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"]

for x in namedEnt:
    if x[0] == 'NN':
        print x[1]

np = [' '.join([y[0] for y in x.leaves()]) for x in namedEnt if x == "NN"]正确地为我提供了NE标签,但无法分别获取NN、NNP和NNS。让我知道如果有其他方法可以这样做。你知道吗


Tags: theinforifasnnnefilm
1条回答
网友
1楼 · 发布于 2024-09-27 21:35:02

似乎您必须在键/值查找中进行一个小的交换。此外,还必须考虑tuple有一个try/except值的情况。下面是一个允许您从树中检索所需值的小方法:

def values_for(tree, tag):
    ret = []
    for x in tree:
        try:
            if x[1] == tag:
                ret.append(x[0])
        except IndexError, e:
            pass
    return ret

然后您应该能够筛选所需的节点:

>>> text = "Stallone jason's film Rocky was inducted into the National Film Registry as well as having its film props placed in the Smithsonian Museum."
>>> tokenized = nltk.word_tokenize(text)
>>> tagged = nltk.pos_tag(tokenized)
>>> namedEnt = nltk.ne_chunk(tagged, binary = True)
>>> values_for(namedEnt, 'NN')
['jason', 'film', 'film']
>>> values_for(namedEnt, 'VBN')
['inducted', 'placed']
>>> values_for(namedEnt, 'NNP')
[]
>>> values_for(namedEnt, 'NNS')
['props']

希望这有帮助。干杯!你知道吗

相关问题 更多 >

    热门问题