为什么带空格的搜索词在pyparsing中不能正确解析？

word = pp.Word(pp.printables, excludeChars=":") word = ("[" + pp.Word(pp.printables + " ", excludeChars=":[]") + "]") | word non_tag = word + ~pp.FollowedBy(":") # tagged value is two words with a ":" tag = pp.Group(word + ":" + word) # one or more non-tag words - use originalTextFor to get back # a single string, including intervening white space phrase = pp.originalTextFor(non_tag[1, ...]) parser = (phrase | tag)[...]

1条回答

网友

1楼 · 发布于 2024-06-26 02:23:13

我通常不鼓励人们写包含空格作为有效单词字符的Word。这样做会禁用大多数先行规则或关键字匹配。这就是为什么“和”和“或”被包括在内在搜索词中，即使它们可能应该是逻辑运算符

如果这应该是一个搜索字符串，那么从编写用于执行搜索的BNF开始：

word := group of any non-whitespace characters, excluding '":[]'
non_tag := word ~":"
tagged_value := word ':' (quoted_string | word)
phrase := non_tag...

search_term := quoted_string | tag | phrase | '[' search_expr ']'

search_expr_not := NOT? search_term
search_expr_and := search_expr_not ['and' search_expr_not]...
search_expr_or := search_expr_and ['or' search_expr_and]...
search_expr := search_expr_or

这将重用几个表达式，就像您定义它们一样。你肯定是在正确的轨道与你的一些表达，如非标签和短语。东西在哪里当您试图通过扩展word来处理带引号的字符串时，情况变得糟糕了表情

我们还需要以一种不匹配任何运算符的方式定义单词关键词“和”、“或”或“不是”。因此，我们首先为它们创建表达式：

AND, OR, NOT = map(pp.CaselessKeyword, "and or not".split())
any_keyword = AND | OR | NOT

我们还将定义一个表达式来专门处理带引号的字符串（而不是在word中添加“和””）：

quoted_string = pp.QuotedString('"')

以下是BNF翻译成pyparsing解析器的第一部分：

COLON = pp.Suppress(":")

word = pp.Combine(~any_keyword + pp.Word(pp.printables, excludeChars=':"\'[]'))

non_tag = word + ~pp.FollowedBy(":")
phrase = pp.originalTextFor(non_tag[1, ...])

# tagged value is a word followed by a ":" and a quoted string or phrase
tagged_value = pp.Group(word + COLON + (quoted_string | phrase))

然后，为了使用“and”、“or”和“not”作为操作符（BNF的最后一部分）将事物联系在一起，我们使用 pyparsing的infixNotation方法。看起来您想使用“[]”作为分组字符，因此我们可以将它们指定为默认“（）”分组字符的覆盖

我们首先使用 BNF：

search_term = quoted_string | tagged_value | phrase

然后使用infixNotation来定义搜索表达式的外观术语：

search_expr = pp.infixNotation(search_term,
                               [
                                   (NOT, 1, pp.opAssoc.RIGHT),
                                   (AND, 2, pp.opAssoc.LEFT),
                                   (OR, 2, pp.opAssoc.LEFT),
                               ],
                               lpar="[", rpar="]")

使用search_expr作为解析器，下面是解析测试字符串的结果：

parser = search_expr

tests = """\
    A free text search and key1: "Microsoft windows (12312)" and key2: "Sample2" or key3: "Another sample (121212)"
    key: "Microsoft windows (12932)" and hey you how are you?
    """
parser.runTests(tests)

印刷品：

A free text search and key1: "Microsoft windows (12312)" and key2: "Sample2" or key3: "Another sample (121212)"
[[['A free text search', 'and', ['key1', 'Microsoft windows (12312)'], 'and', ['key2', 'Sample2']], 'or', ['key3', 'Another sample (121212)']]]
[0]:
  [['A free text search', 'and', ['key1', 'Microsoft windows (12312)'], 'and', ['key2', 'Sample2']], 'or', ['key3', 'Another sample (121212)']]
  [0]:
    ['A free text search', 'and', ['key1', 'Microsoft windows (12312)'], 'and', ['key2', 'Sample2']]
    [0]:
      A free text search
    [1]:
      and
    [2]:
      ['key1', 'Microsoft windows (12312)']
    [3]:
      and
    [4]:
      ['key2', 'Sample2']
  [1]:
    or
  [2]:
    ['key3', 'Another sample (121212)']

key: "Microsoft windows (12932)" and hey you how are you?
[[['key', 'Microsoft windows (12932)'], 'and', 'hey you how are you?']]
[0]:
  [['key', 'Microsoft windows (12932)'], 'and', 'hey you how are you?']
  [0]:
    ['key', 'Microsoft windows (12932)']
  [1]:
    and
  [2]:
    hey you how are you?

要实际评估这些解析结果，请参考pyparsing examples目录中的simpleBool.py示例

相关问题更多 >

编程相关推荐

热门问题

热门文章