使用lark解析器(ebnf grammar)解析罗马数字时出现意外字符错误

2024-09-29 19:31:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我在lark解析器中使用follow语法来解析字母和罗马数字。语法如下:

DIGIT: "0".."9"
INT: DIGIT+
_L_PAREN: "("
_R_PAREN: ")"
LCASE_LETTER: "a".."z"
ROMAN_NUMERALS: "viii" | "vii" | "iii" | "ii" | "ix" | "vi" | "iv" | "v" | "i" | "x"


?start: qns_num qns_alphabet  qns_part
qns_num: INT?
qns_alphabet: _L_PAREN LCASE_LETTER _R_PAREN | LCASE_LETTER _R_PAREN | LCASE_LETTER?
qns_part: _L_PAREN ROMAN_NUMERALS _R_PAREN | ROMAN_NUMERALS _R_PAREN | ROMAN_NUMERALS?

当我使用此规则并分析以下文本时,会得到一个异常:

# lark.exceptions.UnexpectedCharacters: No terminal defined for 'i' at line 1 col 5
# 10i)i)
#     ^
result = Lark(grammar, parser='lalr').parse("10i)i)")

就我的一生而言,我想不出为什么这会引发一个例外。但这很好:

result = Lark(grammar, parser='lalr').parse("10(i)(i)")  # no error

Tags: 语法resultnumintdigitpartromannumerals
1条回答
网友
1楼 · 发布于 2024-09-29 19:31:56

The reason this happens, is because both rules can be empty, which causes the lexer to always jump over one of them in order to match the terminal with the higher priority.

With one rule empty and the second one matched, the parser expects an EOF, not more input. The introduction of ( forces the rule to not be empty.

So, changing the priority on LCASE_LETTER won't help. But not allowing the first rule to be empty will.

The Earley algorithm will know how to resolve this ambiguity automatically.

我在lark-parsergithub页面上问了同样的问题。来自there的答案

相关问题 更多 >

    热门问题