PLY:解析面向行的语法

2024-09-30 22:26:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要解析一种(相对地)简单的、面向行的语言(我没有发明这种语言,它是PlantUML图的定义语言)。在

我的测试输入非常简单:

@startuml
Alice -> Bob: Authentication Request
Bob --> Alice: Authentication Response
Alice -> Bob: Another authentication Request
Alice <-- Bob: another authentication Response
@enduml

出现问题的原因是冒号(':')后面的任何内容都应被视为(可能是转义的)字符串,直到第一个下线('\n')完全忽略可能的内部标点。在

注意:为了简单起见,下面只是语法的节选,如果认为有用的话,我可以把完整的测试程序发布出来。

^{pr2}$

{1>第一个错误是^

分析器调试输出为:

   yacc.py: 360:PLY: PARSE DEBUG START
   yacc.py: 408:
   yacc.py: 409:State  : 0
   yacc.py: 433:Stack  : . LexToken(BEGIN,'@startuml',1,0)
   yacc.py: 443:Action : Shift and goto state 2
   yacc.py: 408:
   yacc.py: 409:State  : 2
   yacc.py: 433:Stack  : BEGIN . LexToken(newline,'\n',1,9)
   yacc.py: 443:Action : Shift and goto state 11
   yacc.py: 408:
   yacc.py: 409:State  : 11
   yacc.py: 433:Stack  : BEGIN newline . LexToken(IDENT,'Alice',2,10)
   yacc.py: 469:Action : Reduce rule [begin -> BEGIN newline] with ['@startuml','\n'] and goto state 1
   yacc.py: 504:Result : <NoneType @ 0x5584868800e0> (None)
   yacc.py: 408:
   yacc.py: 409:State  : 1
   yacc.py: 433:Stack  : begin . LexToken(IDENT,'Alice',2,10)
   yacc.py: 443:Action : Shift and goto state 8
   yacc.py: 408:
   yacc.py: 409:State  : 8
   yacc.py: 433:Stack  : begin IDENT . LexToken(RARROW1,'->',2,16)
   yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Alice'] and goto state 10
   yacc.py: 504:Result : <Node @ 0x7fa389dae9e8> ([[Alice]])
   yacc.py: 408:
   yacc.py: 409:State  : 10
   yacc.py: 433:Stack  : begin node . LexToken(RARROW1,'->',2,16)
   yacc.py: 443:Action : Shift and goto state 20
   yacc.py: 408:
   yacc.py: 409:State  : 20
   yacc.py: 433:Stack  : begin node RARROW1 . LexToken(IDENT,'Bob',2,19)
   yacc.py: 469:Action : Reduce rule [rarrow -> RARROW1] with ['->'] and goto state 22
   yacc.py: 504:Result : <str @ 0x7fa389daea78> ('->')
   yacc.py: 408:
   yacc.py: 409:State  : 22
   yacc.py: 433:Stack  : begin node rarrow . LexToken(IDENT,'Bob',2,19)
   yacc.py: 443:Action : Shift and goto state 8
   yacc.py: 408:
   yacc.py: 409:State  : 8
   yacc.py: 433:Stack  : begin node rarrow IDENT . LexToken(ENDLINE,': Authentication Request',2,22)
   yacc.py: 578:Error  : begin node rarrow IDENT . LexToken(ENDLINE,': Authentication Request',2,22)
   yacc.py: 408:
   yacc.py: 409:State  : 8
   yacc.py: 433:Stack  : begin node rarrow IDENT . LexToken(newline,'\n',2,46)
   yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Bob'] and goto state 26
   yacc.py: 504:Result : <Node @ 0x7fa389daeb00> ([[Bob]])
   yacc.py: 408:
   yacc.py: 409:State  : 26
   yacc.py: 433:Stack  : begin node rarrow node . LexToken(newline,'\n',2,46)
   yacc.py: 469:Action : Reduce rule [trans -> node rarrow node] with [[[Alice]],'->',[[Bob]]] and goto state 9
   yacc.py: 504:Result : <Trans @ 0x7fa389daea58> ([[Alice]] --> [[Bob]])
   yacc.py: 408:
   yacc.py: 409:State  : 9
   yacc.py: 433:Stack  : begin trans . LexToken(newline,'\n',2,46)
   yacc.py: 443:Action : Shift and goto state 16
   yacc.py: 408:
   yacc.py: 409:State  : 16
   yacc.py: 433:Stack  : begin trans newline . LexToken(IDENT,'Bob',3,47)
   yacc.py: 469:Action : Reduce rule [tranc -> trans newline] with [<Trans @ 0x7fa389daea58>,'\n'] and goto state 4
   yacc.py: 504:Result : <Trans @ 0x7fa389daea58> ([[Alice]] --> [[Bob]])
   yacc.py: 408:
   yacc.py: 409:State  : 4
   yacc.py: 433:Stack  : begin tranc . LexToken(IDENT,'Bob',3,47)
   yacc.py: 469:Action : Reduce rule [diag -> tranc] with [<Trans @ 0x7fa389daea58>] and goto state 5
   yacc.py: 504:Result : <Trans @ 0x7fa389daea58> ([[Alice]] --> [[Bob]])
   yacc.py: 408:
   yacc.py: 409:State  : 5
   yacc.py: 433:Stack  : begin diag . LexToken(IDENT,'Bob',3,47)
   yacc.py: 469:Action : Reduce rule [diags -> diag] with [<Trans @ 0x7fa389daea58>] and goto state 6
   yacc.py: 504:Result : <list @ 0x7fa389db3ac8> ([[[Alice]] --> [[Bob]]])
   yacc.py: 408:
   yacc.py: 409:State  : 6
   yacc.py: 433:Stack  : begin diags . LexToken(IDENT,'Bob',3,47)
   yacc.py: 443:Action : Shift and goto state 8
   yacc.py: 408:
   yacc.py: 409:State  : 8
   yacc.py: 433:Stack  : begin diags IDENT . LexToken(RARROW2,'-->',3,51)
   yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Bob'] and goto state 10
   yacc.py: 504:Result : <Node @ 0x7fa389daeb00> ([[Bob]])
   yacc.py: 408:
   yacc.py: 409:State  : 10
   yacc.py: 433:Stack  : begin diags node . LexToken(RARROW2,'-->',3,51)
   yacc.py: 443:Action : Shift and goto state 21
   yacc.py: 408:
   yacc.py: 409:State  : 21
   yacc.py: 433:Stack  : begin diags node RARROW2 . LexToken(IDENT,'Alice',3,55)
   yacc.py: 469:Action : Reduce rule [rarrow -> RARROW2] with ['-->'] and goto state 22
   yacc.py: 504:Result : <str @ 0x7fa389daeb90> ('-->')
   yacc.py: 408:
   yacc.py: 409:State  : 22
   yacc.py: 433:Stack  : begin diags node rarrow . LexToken(IDENT,'Alice',3,55)
   yacc.py: 443:Action : Shift and goto state 8
   yacc.py: 408:
   yacc.py: 409:State  : 8
   yacc.py: 433:Stack  : begin diags node rarrow IDENT . LexToken(ENDLINE,': Authentication Response',3,60)
   yacc.py: 578:Error  : begin diags node rarrow IDENT . LexToken(ENDLINE,': Authentication Response',3,60)
   yacc.py: 408:
   yacc.py: 409:State  : 8
   yacc.py: 433:Stack  : begin diags node rarrow IDENT . LexToken(newline,'\n',3,85)
   yacc.py: 469:Action : Reduce rule [node -> IDENT] with ['Alice'] and goto state 26
   yacc.py: 504:Result : <Node @ 0x7fa389dae9e8> ([[Alice]])
   yacc.py: 408:

如您所见,第二个IDENT('Bob')后面的标记是一个ENDLINE(': Authentication Request'),它将冒号作为第一个字符,从而使解析器完全失去了功能。在

对此,建议的解决方法是什么?在


Tags: andpynodestackactionbobstatebegin
1条回答
网友
1楼 · 发布于 2024-09-30 22:26:07

这个词法分析器甚至能工作一点点,这是Ply应用词汇规则的特殊顺序的结果。[注1]

词法分析是最简单的,当你可以分析输入到一个词素序列,其中一个词素可以被识别,而不需要考虑以前的词素。这是任何标记器框架的默认模型。在该模型中,定义为“行末的任何内容”的词汇模式总是适用的,这意味着您的输入将被分析成新行和其他行。那可能不是你想要的。在

看起来单词名实际上是“一个冒号,后面跟着行的其余部分”,因此没有必要将冒号和行的其余部分分成两个标记。如果是这样的话,那么这个模式很容易写:r':.*'。(如果冒号在其他地方被用于其他目的,这就行不通了。希望他们不是。)

{{cd2>符号中的两个符号的顺序不匹配,{cd2>中的两个符号的顺序不匹配。在


注:

  1. 层按以下顺序检查图案:

    • 首先,按照函数在文件中定义的顺序从令牌函数中获取模式
    • 第二,模式来自令牌变量,按长度倒序排列(即从最长到最短)。在

    由于模式.*比模式:长,因此将首先尝试,因此将永远无法识别冒号。我相信,纯粹是运气,->.*之前匹配。对于具有相同长度的图案,不应依赖于按长度排列的图案。在

    总的来说,最好使用以下策略之一:

    • 只能使用令牌函数,并按正确的顺序手动排序。

    • 只对明确的模式使用标记变量。

相关问题 更多 >