NLTK RegEx Chunker未使用通配符捕获定义的语法模式

import nltk ## Defining the POS tagger tagger = nltk.data.load(nltk.tag._POS_TAGGER) ## A Single sentence - input text value textv="This has allowed the device to start, and I then see glitches which is not nice." tagged_text = tagger.tag(textv.split()) ## Defining Grammar rules for Phrases actphgrammar = r""" Ph: {<VB*>+<DT>?<NN*>+} # verbal phrase - one or more verbs followed by optional determiner, and one or more nouns at the end {<RB*><VB*|JJ*|NN*\$>} # Adverbial phrase - Adverb followed by adjective / Noun or Verb """ ### Parsing the defined grammar for phrases actp = nltk.RegexpParser(actphgrammar) actphrases = actp.parse(tagged_text)

1条回答

网友
1楼 · 发布于 2024-09-29 03:27:55

关闭，但对正则表达式的微小更改将获得所需的输出。当您想使用RegexpParser语法获得通配符时，应该使用.*而不是{}，例如，VB.*而不是{}：
>>> from nltk import word_tokenize, pos_tag, RegexpParser >>> text = "This has allowed the device to start, and I then see glitches which is not nice." >>> tagged_text = pos_tag(word_tokenize(text)) >>> g = r""" ... VP: {<VB.*><DT><NN.*>} ... """ >>> p = RegexpParser(g); p.parse(tagged_text) Tree('S', [('This', 'DT'), ('has', 'VBZ'), Tree('VP', [('allowed', 'VBN'), ('the', 'DT'), ('device', 'NN')]), ('to', 'TO'), ('start', 'VB'), (',', ','), ('and', 'CC'), ('I', 'PRP'), ('then', 'RB'), ('see', 'VBP'), ('glitches', 'NNS'), ('which', 'WDT'), ('is', 'VBZ'), ('not', 'RB'), ('nice', 'JJ'), ('.', '.')])
请注意，您捕获的是Tree(AdvP, [('then', 'RB'), ('see', 'VB')])，因为这些标记正好是RB和{}。因此，在这个场景中，语法中的通配符（即“AdvP:{}”“”）将被忽略。在
另外，如果是两种不同类型的短语，最好使用两个标签而不是一个。而且（我认为）通配符后面的字符串结尾有点多余，所以最好：
^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章