在python中使用n提取块

2024-06-01 11:53:36 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个带标记的语料库(比如brown语料库),我想提取只带'/nn'标记的单词。例如:

            Daniel/np termed/vbd ``/`` extremely/rb conservative/jj ''/'' his/pp$    estimate/nn.....

这是标记语料库“brown”的一部分。我想做的是提取单词,比如-estimate(因为它用/nn标记)并将它们添加到一个列表中。但我发现的大多数例子都是关于标注语料库的。看到这些例子我真的很困惑。 有谁能帮我提供一个例子或教程,从标记的语料库中提取单词。在

提前谢谢。在


Tags: 标记npnn单词例子extremely语料库daniel
1条回答
网友
1楼 · 发布于 2024-06-01 11:53:36

参见:http://nltk.googlecode.com/svn/trunk/doc/book/ch05.html

>>> sent = '''
... The/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN
... other/AP topics/NNS ,/, AMONG/IN them/PPO the/AT Atlanta/NP and/CC
... Fulton/NP-tl County/NN-tl purchasing/VBG departments/NNS which/WDT it/PPS
... said/VBD ``/`` ARE/BER well/QL operated/VBN and/CC follow/VB generally/RB
... accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT
... interest/NN of/IN both/ABX governments/NNS ''/'' ./.
... '''
>>> [nltk.tag.str2tuple(t) for t in sent.split()]
[('The', 'AT'), ('grand', 'JJ'), ('jury', 'NN'), ('commented', 'VBD'),
('on', 'IN'), ('a', 'AT'), ('number', 'NN'), ... ('.', '.')]

如果您只希望那些标记有NN的,可以执行以下操作:

^{pr2}$

编辑:

这里的sent是一个字符串减去省略号。在

sent = """The/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN other/AP topics/NNS ,/, AMONG/IN them/PPO the/AT Atlanta/NP and/CC Fulton/NP-tl County/NN-tl purchasing/VBG departments/NNS which/WDT it/PPS said/VBD ``/`` ARE/BER well/QL operated/VBN and/CC follow/VB generally/RB accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT interest/NN of/IN both/ABX governments/NNS ''/'' ./."""

相关问题 更多 >