在python中通过弹性搜索搜索唯一值

2024-09-28 19:05:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在“description”列中获取唯一值。根据我的数据,我有许多类似的描述。我只想要独一无二的

con.search(index='data', body={
        "aggs": {
            "query": {
                "match": {"description": query_input}
            },
            "size": 30,
            "distinct_description": {
            }
        }


    })

然而,这根本行不通。 任何建议

例如:

{id: 1, state: "OP", description: "hot and humid"}
{id: 2, state: "LO", description: "dry"}
{id: 3, state: "WE", description: "hot and humid"}
{id: 4, state: "OP", description: "green and vegetative"}
{id: 5, state: "HP", description: "dry"}

结果:

{id: 1, state: "OP", description: "hot and humid"}
{id: 2, state: "LO", description: "dry"}
{id: 4, state: "OP", description: "green and vegetative"}

Tags: and数据idlosearchgreendescriptionquery
1条回答
网友
1楼 · 发布于 2024-09-28 19:05:41

您应该在description.keyword子字段上尝试术语聚合:

body = {
  "query": {
    "match": {"state": query_input}
  },
   "size":1000,
  "aggs": {
    "distinct_descriptions": {
      "terms": {
        "field": "description.keyword"
      }
    }
  }
}

result = con.search(index='data', body=body)
occurrences_list = list()
occurrences_dict = {"description":None, "score":None}
for res in result["aggregations"]["distinct_descriptions"]["buckets"]:
    occurrences_dict["description"] = {res['key'] : res['doc_count'] }
    occurrences_list.append( occurrences_dict )

for res in result["hits"]["hits"]:
    for elem in occurrences_list:
        if res["_source"]["description"] == elem['description']:
            if not elem["score"]:
                elem["score"] = res["_score"]

请注意星期一生成的查询,现在还有一个大小参数,否则elasticsearch默认只检索20次点击

相关问题 更多 >