弹性搜索:用于匹配autocomplete suggester中字符串列表中最长字符串的正则表达式

2024-05-02 22:31:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我对弹性搜索非常陌生,并尝试用正则表达式查询实现自动完成建议器。一旦我收到一个查询,我将获取查询的最后5个单词,并以此格式形成一个令牌列表

query - i am trying regex for elasticsearch
tokens - [elasticsearch, for elasticsearch, regex for elasticsearch ...]

我的要求是识别与标记列表中最长字符串相匹配的索引句子 我很难为它编写正则表达式。有人能帮忙吗

我的映射:

 "mappings": {
"properties": {
  "keywords": {
    "type": "text",
    "fields": {
      "keywords_suggest": {
        "type": "completion"
      }
    }
  },
  "sections": {
    "type": "text",
    "fields": {
      "sections_suggest": {
        "type": "completion"
      }
    }
  },
  "title": {
    "type": "text",
    "fields": {
      "title_suggest": {
        "type": "completion"
      }
    }
  }

这就是我提出搜索请求的方式

 body = {
        "from": 0, "size": size,
        "query": {
            "multi_match": {
                "query": query,
                "fields": ["title^3", "searchResultPreview^1", "body^5"], #ignore these fields as i only pasted mapping used for completion type
                "fuzziness": "AUTO"
            }
        },
        "suggest": {
            "title-suggest": {
                "regex": regex,
                "completion": {
                    "field": "title.title_suggest",
                    "skip_duplicates": True,
                }
            },
            "keyword-suggest": {
                "regex": regex,
                "completion": {
                    "field": "keywords.keywords_suggest",
                    "skip_duplicates": True,
                }
            },
            "section-suggest": {
                "regex": regex,
                "completion": {
                    "field": "sections.sections_suggest",
                    "skip_duplicates": True,
                }
            }
        }
    }

  search_result = self.es.search(index=index_name, body=body)
Indexed sentence1 - The Real purpose of elasticsearch is unknown
Indexed sentence2 - real function is not defined
query - i want to know the real
list of words - [ real, the real, know the real, to know the real]

我试着遵循正则表达式-

(to know the real|know the real|the real|real)

必需的输出-索引语句1需要匹配,因为它是列表中句子开头的最长单词,但它只显示与以real开头的句子匹配

谁能告诉我,我哪里出了问题

编辑:我认为区分大小写不是问题,因为单词real的匹配不区分大小写


Tags: thefieldsfortitletypebodyelasticsearchquery