A grammar production. Each production maps a single symbol
on the "left-hand side" to a sequence of symbols on the
"right-hand side". (In the case of context-free productions,
the left-hand side must be a Nonterminal, and the right-hand
side is a sequence of terminals and Nonterminals.)
"terminals" can be any immutable hashable object that is
not a Nonterminal. Typically, terminals are strings
representing words, such as "dog" or "under".
>>> sent = ['show', 'me', 'northwest', 'flights', 'to', 'singapore', '.']
>>> for i in original_parser.parse(sent):
... print i
... break
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/api.py", line 49, in parse
return iter(self.parse_all(sent))
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/chart.py", line 1350, in parse_all
chart = self.chart_parse(tokens)
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/chart.py", line 1309, in chart_parse
self._grammar.check_coverage(tokens)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 631, in check_coverage
"input words: %r." % missing)
ValueError: Grammar does not cover some of the input words: u"'singapore'".
# First let's create Non-terminal for singapore.
>>> nltk.grammar.Nonterminal('singapore')
singapore
>>> lhs = nltk.grammar.Nonterminal('singapore')
>>> rhs = [u'singapore']
# Now we can create the Production for singapore.
>>> singapore_production = nltk.grammar.Production(lhs, rhs)
# Now let's try to add this Production the grammar's list of production
>>> new_grammar = nltk.data.load('grammars/large_grammars/atis.cfg')
>>> new_grammar._productions.append(singapore_production)
>>> new_grammar = nltk.data.load('grammars/large_grammars/atis.cfg')
>>> new_grammar._productions.append(singapore_production)
>>> new_parser = ChartParser(new_grammar)
>>> sent = ['show', 'me', 'northwest', 'flights', 'to', 'singapore', '.']
>>> new_parser.parse(sent)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/api.py", line 49, in parse
return iter(self.parse_all(sent))
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/chart.py", line 1350, in parse_all
chart = self.chart_parse(tokens)
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/chart.py", line 1309, in chart_parse
self._grammar.check_coverage(tokens)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 631, in check_coverage
"input words: %r." % missing)
ValueError: Grammar does not cover some of the input words: u"'singapore'".
从下面我们知道,新加坡就像底特律,底特律通向左手边的左手边NOUN_NP -> detroit:
>>> original_grammar._rhs_index[original_grammar._rhs_index['detroit'][0]._lhs]
[NOUN_NP -> detroit, NOUN_NP -> detroit minneapolis toronto]
sent = ['show', 'me', 'northwest', 'flights', 'to', 'singapore', '.']
print new_grammar.productions()[2091]
print new_grammar.productions()[-1]
new_parser = nltk.ChartParser(new_grammar)
for i in new_parser.parse(sent):
print i
[出来]:
Traceback (most recent call last):
File "test.py", line 31, in <module>
for i in new_parser.parse(sent):
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/api.py", line 49, in parse
return iter(self.parse_all(sent))
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/chart.py", line 1350, in parse_all
chart = self.chart_parse(tokens)
File "/usr/local/lib/python2.7/dist-packages/nltk/parse/chart.py", line 1309, in chart_parse
self._grammar.check_coverage(tokens)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 631, in check_coverage
"input words: %r." % missing)
ValueError: Grammar does not cover some of the input words: u"'singapore'".
简而言之,:是的,这是可能的,但是你会经历很多痛苦,用
atis.cfg
作为基础重写CFG,然后再读取新的CFG文本文件,这样就更容易了。将每个新终端重新分配到正确的非终端以映射它们要容易得多在long中,请参见以下内容
首先让我们看看NLTK中的CFG语法是什么,它包含什么:
有关详细信息,请参见https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L421
似乎终端和非终端都是
Production
类型,参见https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236,即我们来看看语法是如何存储结果的:
^{2}$现在,我们似乎可以创建
nltk.grammar.Production
对象并将它们附加到grammar._productions
中。在让我们试试原始语法:
原始语法没有结尾
singapore
:在我们尝试将
singapore
添加到语法中之前,让我们看看detroit
是如何存储在语法中的:所以现在我们可以尝试为
singapore
重新创建相同的Production
对象:但它仍然不起作用,但由于提供终端本身并不能真正帮助将其与CFG的其他部分联系起来,因此新加坡仍然无法解析:
从下面我们知道,新加坡就像底特律,底特律通向左手边的左手边
NOUN_NP -> detroit
:因此,我们需要做的是为新加坡添加另一个产品,从而导致
NOUN_NP
非终端,或者将我们的新加坡lh附加到名词_NP nonterminals的右手边:现在让我们为
NOUN_NP -> singapore
添加新产品:现在我们应该希望解析器能够正常工作:
[出来]:
但是语法似乎仍然无法识别我们添加的新的终端和非终端,所以让我们尝试一下,将我们的新语法输出到字符串中,并从输出字符串创建一个新的语法:
[出来]:
^{14}$相关问题 更多 >
编程相关推荐