为什么带空格的搜索词在pyparsing中不能正确解析?

2024-06-26 02:23:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我的输入像key: "a word"或像anotherkey: "a word (1234)"一样给出。我的问题是我使用了以下语法:

word = pp.Word(pp.printables, excludeChars=":")
word = ("[" + pp.Word(pp.printables + " ", excludeChars=":[]") + "]") | word
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]

当我的输入类似于key: "value1" and hey you how are you?时,它将查询转换为([(['key', ':', '"value1"'], {}), 'and hey you how are you?'], {})的预期输出,但当我尝试在键后的值之间留有空格时,就会出现问题:

parser.parseString('key: "Microsoft windows (12932)" and hey you how are you?')
([(['key', ':', '"Microsoft'], {}), 'windows (12932)" and hey you how are you?'], {})

它在Microsoftwindows上中断。我知道pyparsing忽略了空格,但是我如何解决这个问题并得到结果,直到短语的末尾,也就是双引号


编辑-1 我试图通过添加另一个单词来解决此问题,如下图所示:

word = ('"' + pp.Word(pp.printables + " ", excludeChars=':"') + '"') | word

它适用于像key: "windows server (23232)"这样的查询,但不适用于像key1: value and key2: "windows server (1212)"这样更复杂的查询。有人对这个问题有任何线索吗?我应该如何避免这种错误行为


编辑-2我期望什么?我需要的是扩展我的语法,如下所示:

'key: "Microsoft windows (12932)" and hey you how are you?

它不应该是:

([(['key', ':', '"Microsoft'], {}), 'windows (12932)" and hey you how are you?'], {})

应该是这样的:

([(['key', ':', '"Microsoft windows (12932)"'], {}), 'and hey you how are you?'], {})

此查询可以通过以下自由文本搜索与更多键组合:

A free text search and key1: "Microsoft windows (12312) and key2: "Sample2" or key3: "Another sample (121212)"

这也应该得到如下解析:

part1-> A free text search and
part2: ['key1', ':', '"Microsoft windows (12932)"']
part3: ['key2', ':', '"Sample2"']
part3: ['key3', ':', '"Another sample (121212)"']

注意:如果andor附加到令牌上,对我来说没问题我只需要将自由文本搜索与关键字:值查询分开。


Tags: andkeyyouwindowstagaremicrosoftpp
1条回答
网友
1楼 · 发布于 2024-06-26 02:23:13

我通常不鼓励人们写包含空格作为有效单词字符的Word。 这样做会禁用大多数先行规则或关键字匹配。这就是为什么“和”和“或”被包括在内 在搜索词中,即使它们可能应该是逻辑运算符

如果这应该是一个搜索字符串,那么从编写用于执行搜索的BNF开始:

word := group of any non-whitespace characters, excluding '":[]'
non_tag := word ~":"
tagged_value := word ':' (quoted_string | word)
phrase := non_tag...

search_term := quoted_string | tag | phrase | '[' search_expr ']'

search_expr_not := NOT? search_term
search_expr_and := search_expr_not ['and' search_expr_not]...
search_expr_or := search_expr_and ['or' search_expr_and]...
search_expr := search_expr_or

这将重用几个表达式,就像您定义它们一样。你肯定是 在正确的轨道与你的一些表达,如非标签和短语。东西在哪里 当您试图通过扩展word来处理带引号的字符串时,情况变得糟糕了 表情

我们还需要以一种不匹配任何运算符的方式定义单词 关键词“和”、“或”或“不是”。因此,我们首先为它们创建表达式:

AND, OR, NOT = map(pp.CaselessKeyword, "and or not".split())
any_keyword = AND | OR | NOT

我们还将定义一个表达式来专门处理带引号的字符串 (而不是在word中添加“和””):

quoted_string = pp.QuotedString('"')

以下是BNF翻译成pyparsing解析器的第一部分:

COLON = pp.Suppress(":")

word = pp.Combine(~any_keyword + pp.Word(pp.printables, excludeChars=':"\'[]'))

non_tag = word + ~pp.FollowedBy(":")
phrase = pp.originalTextFor(non_tag[1, ...])

# tagged value is a word followed by a ":" and a quoted string or phrase
tagged_value = pp.Group(word + COLON + (quoted_string | phrase))

然后,为了使用“and”、“or”和“not”作为操作符(BNF的最后一部分)将事物联系在一起,我们使用 pyparsing的infixNotation方法。看起来您想使用“[]”作为分组 字符,因此我们可以将它们指定为默认“()”分组字符的覆盖

我们首先使用 BNF:

search_term = quoted_string | tagged_value | phrase

然后使用infixNotation来定义搜索表达式的外观 术语:

search_expr = pp.infixNotation(search_term,
                               [
                                   (NOT, 1, pp.opAssoc.RIGHT),
                                   (AND, 2, pp.opAssoc.LEFT),
                                   (OR, 2, pp.opAssoc.LEFT),
                               ],
                               lpar="[", rpar="]")

使用search_expr作为解析器,下面是解析测试字符串的结果:

parser = search_expr

tests = """\
    A free text search and key1: "Microsoft windows (12312)" and key2: "Sample2" or key3: "Another sample (121212)"
    key: "Microsoft windows (12932)" and hey you how are you?
    """
parser.runTests(tests)

印刷品:

A free text search and key1: "Microsoft windows (12312)" and key2: "Sample2" or key3: "Another sample (121212)"
[[['A free text search', 'and', ['key1', 'Microsoft windows (12312)'], 'and', ['key2', 'Sample2']], 'or', ['key3', 'Another sample (121212)']]]
[0]:
  [['A free text search', 'and', ['key1', 'Microsoft windows (12312)'], 'and', ['key2', 'Sample2']], 'or', ['key3', 'Another sample (121212)']]
  [0]:
    ['A free text search', 'and', ['key1', 'Microsoft windows (12312)'], 'and', ['key2', 'Sample2']]
    [0]:
      A free text search
    [1]:
      and
    [2]:
      ['key1', 'Microsoft windows (12312)']
    [3]:
      and
    [4]:
      ['key2', 'Sample2']
  [1]:
    or
  [2]:
    ['key3', 'Another sample (121212)']

key: "Microsoft windows (12932)" and hey you how are you?
[[['key', 'Microsoft windows (12932)'], 'and', 'hey you how are you?']]
[0]:
  [['key', 'Microsoft windows (12932)'], 'and', 'hey you how are you?']
  [0]:
    ['key', 'Microsoft windows (12932)']
  [1]:
    and
  [2]:
    hey you how are you?

要实际评估这些解析结果,请参考pyparsing examples目录中的simpleBool.py示例

相关问题 更多 >