我需要解析一种(相对地)简单的、面向行的语言(我没有发明这种语言,它是PlantUML图的定义语言)。在
我的测试输入非常简单:
@startuml
Alice -> Bob: Authentication Request
Bob --> Alice: Authentication Response
Alice -> Bob: Another authentication Request
Alice <-- Bob: another authentication Response
@enduml
出现问题的原因是冒号(':')后面的任何内容都应被视为(可能是转义的)字符串,直到第一个下线('\n')完全忽略可能的内部标点。在
注意:为了简单起见,下面只是语法的节选,如果认为有用的话,我可以把完整的测试程序发布出来。
^{pr2}${1>第一个错误是^
分析器调试输出为:
yacc.py: 360:PLY: PARSE DEBUG START
yacc.py: 408:
yacc.py: 409:State : 0
yacc.py: 433:Stack : . LexToken(BEGIN,'@startuml',1,0)
yacc.py: 443:Action : Shift and goto state 2
yacc.py: 408:
yacc.py: 409:State : 2
yacc.py: 433:Stack : BEGIN . LexToken(newline,'\n',1,9)
yacc.py: 443:Action : Shift and goto state 11
yacc.py: 408:
yacc.py: 409:State : 11
yacc.py: 433:Stack : BEGIN newline . LexToken(IDENT,'Alice',2,10)
yacc.py: 469:Action : Reduce rule [begin -> BEGIN newline] with ['@startuml','\n'] and goto state 1
yacc.py: 504:Result : <NoneType @ 0x5584868800e0> (None)
yacc.py: 408:
yacc.py: 409:State : 1
yacc.py: 433:Stack : begin . LexToken(IDENT,'Alice',2,10)
yacc.py: 443:Action : Shift and goto state 8
yacc.py: 408:
yacc.py: 409:State : 8
yacc.py: 433:Stack : begin IDENT . LexToken(RARROW1,'->',2,16)
yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Alice'] and goto state 10
yacc.py: 504:Result : <Node @ 0x7fa389dae9e8> ([[Alice]])
yacc.py: 408:
yacc.py: 409:State : 10
yacc.py: 433:Stack : begin node . LexToken(RARROW1,'->',2,16)
yacc.py: 443:Action : Shift and goto state 20
yacc.py: 408:
yacc.py: 409:State : 20
yacc.py: 433:Stack : begin node RARROW1 . LexToken(IDENT,'Bob',2,19)
yacc.py: 469:Action : Reduce rule [rarrow -> RARROW1] with ['->'] and goto state 22
yacc.py: 504:Result : <str @ 0x7fa389daea78> ('->')
yacc.py: 408:
yacc.py: 409:State : 22
yacc.py: 433:Stack : begin node rarrow . LexToken(IDENT,'Bob',2,19)
yacc.py: 443:Action : Shift and goto state 8
yacc.py: 408:
yacc.py: 409:State : 8
yacc.py: 433:Stack : begin node rarrow IDENT . LexToken(ENDLINE,': Authentication Request',2,22)
yacc.py: 578:Error : begin node rarrow IDENT . LexToken(ENDLINE,': Authentication Request',2,22)
yacc.py: 408:
yacc.py: 409:State : 8
yacc.py: 433:Stack : begin node rarrow IDENT . LexToken(newline,'\n',2,46)
yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Bob'] and goto state 26
yacc.py: 504:Result : <Node @ 0x7fa389daeb00> ([[Bob]])
yacc.py: 408:
yacc.py: 409:State : 26
yacc.py: 433:Stack : begin node rarrow node . LexToken(newline,'\n',2,46)
yacc.py: 469:Action : Reduce rule [trans -> node rarrow node] with [[[Alice]],'->',[[Bob]]] and goto state 9
yacc.py: 504:Result : <Trans @ 0x7fa389daea58> ([[Alice]] --> [[Bob]])
yacc.py: 408:
yacc.py: 409:State : 9
yacc.py: 433:Stack : begin trans . LexToken(newline,'\n',2,46)
yacc.py: 443:Action : Shift and goto state 16
yacc.py: 408:
yacc.py: 409:State : 16
yacc.py: 433:Stack : begin trans newline . LexToken(IDENT,'Bob',3,47)
yacc.py: 469:Action : Reduce rule [tranc -> trans newline] with [<Trans @ 0x7fa389daea58>,'\n'] and goto state 4
yacc.py: 504:Result : <Trans @ 0x7fa389daea58> ([[Alice]] --> [[Bob]])
yacc.py: 408:
yacc.py: 409:State : 4
yacc.py: 433:Stack : begin tranc . LexToken(IDENT,'Bob',3,47)
yacc.py: 469:Action : Reduce rule [diag -> tranc] with [<Trans @ 0x7fa389daea58>] and goto state 5
yacc.py: 504:Result : <Trans @ 0x7fa389daea58> ([[Alice]] --> [[Bob]])
yacc.py: 408:
yacc.py: 409:State : 5
yacc.py: 433:Stack : begin diag . LexToken(IDENT,'Bob',3,47)
yacc.py: 469:Action : Reduce rule [diags -> diag] with [<Trans @ 0x7fa389daea58>] and goto state 6
yacc.py: 504:Result : <list @ 0x7fa389db3ac8> ([[[Alice]] --> [[Bob]]])
yacc.py: 408:
yacc.py: 409:State : 6
yacc.py: 433:Stack : begin diags . LexToken(IDENT,'Bob',3,47)
yacc.py: 443:Action : Shift and goto state 8
yacc.py: 408:
yacc.py: 409:State : 8
yacc.py: 433:Stack : begin diags IDENT . LexToken(RARROW2,'-->',3,51)
yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Bob'] and goto state 10
yacc.py: 504:Result : <Node @ 0x7fa389daeb00> ([[Bob]])
yacc.py: 408:
yacc.py: 409:State : 10
yacc.py: 433:Stack : begin diags node . LexToken(RARROW2,'-->',3,51)
yacc.py: 443:Action : Shift and goto state 21
yacc.py: 408:
yacc.py: 409:State : 21
yacc.py: 433:Stack : begin diags node RARROW2 . LexToken(IDENT,'Alice',3,55)
yacc.py: 469:Action : Reduce rule [rarrow -> RARROW2] with ['-->'] and goto state 22
yacc.py: 504:Result : <str @ 0x7fa389daeb90> ('-->')
yacc.py: 408:
yacc.py: 409:State : 22
yacc.py: 433:Stack : begin diags node rarrow . LexToken(IDENT,'Alice',3,55)
yacc.py: 443:Action : Shift and goto state 8
yacc.py: 408:
yacc.py: 409:State : 8
yacc.py: 433:Stack : begin diags node rarrow IDENT . LexToken(ENDLINE,': Authentication Response',3,60)
yacc.py: 578:Error : begin diags node rarrow IDENT . LexToken(ENDLINE,': Authentication Response',3,60)
yacc.py: 408:
yacc.py: 409:State : 8
yacc.py: 433:Stack : begin diags node rarrow IDENT . LexToken(newline,'\n',3,85)
yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Alice'] and goto state 26
yacc.py: 504:Result : <Node @ 0x7fa389dae9e8> ([[Alice]])
yacc.py: 408:
如您所见,第二个IDENT('Bob')
后面的标记是一个ENDLINE(': Authentication Request')
,它将冒号作为第一个字符,从而使解析器完全失去了功能。在
对此,建议的解决方法是什么?在
这个词法分析器甚至能工作一点点,这是Ply应用词汇规则的特殊顺序的结果。[注1]
词法分析是最简单的,当你可以分析输入到一个词素序列,其中一个词素可以被识别,而不需要考虑以前的词素。这是任何标记器框架的默认模型。在该模型中,定义为“行末的任何内容”的词汇模式总是适用的,这意味着您的输入将被分析成新行和其他行。那可能不是你想要的。在
看起来单词名实际上是“一个冒号,后面跟着行的其余部分”,因此没有必要将冒号和行的其余部分分成两个标记。如果是这样的话,那么这个模式很容易写:
r':.*'
。(如果冒号在其他地方被用于其他目的,这就行不通了。希望他们不是。){{cd2>符号中的两个符号的顺序不匹配,{cd2>中的两个符号的顺序不匹配。在
注:
层按以下顺序检查图案:
由于模式
.*
比模式:
长,因此将首先尝试,因此将永远无法识别冒号。我相信,纯粹是运气,->
在.*
之前匹配。对于具有相同长度的图案,不应依赖于按长度排列的图案。在总的来说,最好使用以下策略之一:
只能使用令牌函数,并按正确的顺序手动排序。
只对明确的模式使用标记变量。
相关问题 更多 >
编程相关推荐