pyparsing：忽略注释时，parseAll=True不会引发ParseException

def test(): import pyparsing as p unquoted_exclude = "\\\"" + "':/|<>,;#" unquoted_chars = ''.join(set(p.printables) - set(unquoted_exclude)) unquotedkey = p.Word(unquoted_chars) more = p.OneOrMore(unquotedkey) more.ignore("#" + p.restOfLine) # ^^ "more" should ignore comments, but not "unquotedkey" !! def parse(parser, input_, parseAll=True): try: print input_ print parser.parseString(input_, parseAll).asList() except Exception as err: print err parse(unquotedkey, "abc#d") parse(unquotedkey, "abc|d") withstringend = unquotedkey + p.stringEnd parse(withstringend, "abc#d", False) parse(withstringend, "abc|d", False)

abc#d ['abc'] <--- should throw an exception but does not abc|d Expected end of text (at char 3), (line:1, col:4) abc#d Expected stringEnd (at char 3), (line:1, col:4) abc|d Expected stringEnd (at char 3), (line:1, col:4)

1条回答

网友

1楼 · 发布于 2024-10-03 13:19:26

要比较苹果和苹果，还应该在定义withstringend之后添加以下行：

withstringend.ignore('#' + p.restOfLine)

我想您将看到它的行为与使用unquotedKey进行解析的测试相同。在

ignore的目的是忽略解析的输入文本中任何地方的构造，而不仅仅是在最高层。例如，在C程序中，不能忽略语句之间的注释：

^{pr2}$

您还必须忽略可能出现在任何位置的注释：

x /* this is a post-increment 
so it really won't add 1 to x until after the
statement executes */ ++
/* and this is the trailing semicolon 
for the previous statement -> */;

或者也许不那么做作：

for (x = ptr; /* start at ptr */
     *x; /* keep going as long as we point to non-zero */
     x++ /* add one to x */ )

为了支持这一点，ignore()被实现来递归整个已定义的解析器，并更新整个解析器中每个子解析器上可忽略表达式的列表，从而在整个解析器的每个级别跳过可忽略表达式。另一种方法是将对ignore的调用散布到整个解析器定义中，并不断尝试查找那些意外跳过的调用。在

所以在你的第一个案例中，当你这样做的时候：

more = p.OneOrMore(unquotedKey)
more.ignore('#' + p.restOfline)

您还更新了unquotedKey的可忽略文件。如果要隔离unquotedKey，使其不会产生这种副作用，那么使用以下方法定义more：

more = p.OneOrMore(unquotedKey.copy())

另一点-你对一个不带引号的键的定义是“除了这些特殊字符以外的所有可打印文件”。直到1.5.6版，当excludeChars参数被添加到Word类中时，您所使用的技术一直很好。现在你不必在构建只包含允许的字符的列表上费事了，你可以让Word来完成这项工作。尝试：

unquotedKey = p.Word(p.printables,
                     excludeChars = r'\"' + "':/|<>,;#")

相关问题更多 >

编程相关推荐

热门问题

热门文章