使用python聚合elasticsearch dsl中的字段

2条回答

网友

1楼 · 编辑于 2024-05-20 00:39:10

我还没有代表发表评论，但我想对马修对VISQL关于from_dict的回答的评论做一个小的修正。如果要维护搜索属性，请使用update_from_dict而不是from_dict

根据Docs，from_dict创建一个新的搜索对象，但是update_from_dict将就地修改，如果搜索已经具有索引、使用等属性，则需要这样做

因此，您需要在搜索之前声明查询体，然后创建如下搜索：

query_body = {
    "size": 0,
    "aggs": {
        "by_house": {
            "terms": {
                "field": "house_number",
                "size": 0
            }
        }
    }
}

s = Search(using=client, index="airbnb", doc_type="sleep_overs").update_from_dict(query_body)

网友

2楼 · 编辑于 2024-05-20 00:39:10

首先。我注意到我在这里写的，实际上没有定义聚合。关于如何使用它的文档对我来说不太可读。用我上面写的，我会扩展。我正在更改索引名，以创建一个更好的示例。

from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])

s = Search(using=client, index="airbnb", doc_type="sleep_overs")
s = s.execute()

# invalid! You haven't defined an aggregation.
#for tag in s.aggregations.per_tag.buckets:
#    print (tag.key)

# Lets make an aggregation
# 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator
# 'field' is also a keyword, and 'house_number' is a field in our ES index
s.aggs.bucket('by_house', 'terms', field='house_number', size=0)

上面我们为每个房屋编号创建一个桶。因此，桶的名字就是门牌号。ElasticSearch（ES）将始终提供适合该bucket的文档计数。Size=0意味着使用所有结果，因为ES有一个默认设置，只返回10个结果（或者您的开发人员设置它执行的任何操作）。

# This runs the query.
s = s.execute()

# let's see what's in our results

print s.aggregations.by_house.doc_count
print s.hits.total
print s.aggregations.by_house.buckets

for item in s.aggregations.by_house.buckets:
    print item.doc_count

我以前的错误是认为弹性搜索查询默认有聚合。你自己定义它们，然后执行它们。然后你的回复可以被你提到的聚合器分割。

上面的卷曲应该是：
注意：我使用SENSE一个用于Google Chrome的ElasticSearch插件/扩展/插件。从某种意义上说，你可以用//来评论事情。

POST /airbnb/sleep_overs/_search
{
// the size 0 here actually means to not return any hits, just the aggregation part of the result
    "size": 0,
    "aggs": {
        "by_house": {
            "terms": {
// the size 0 here means to return all results, not just the the default 10 results
                "field": "house_number",
                "size": 0
            }
        }
    }
}

努力工作。DSL的GIT上有人告诉我不要翻译，只要使用这种方法。更简单的是，你可以用卷曲的方式写一些难的东西。所以我称之为工作。

# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="airbnb", doc_type="sleep_overs")

# how simple we just past CURL code here
body = {
    "size": 0,
    "aggs": {
        "by_house": {
            "terms": {
                "field": "house_number",
                "size": 0
            }
        }
    }
}

s = Search.from_dict(body)
s = s.index("airbnb")
s = s.doc_type("sleepovers")
body = s.to_dict()

t = s.execute()

for item in t.aggregations.by_house.buckets:
# item.key will the house number
    print item.key, item.doc_count

希望这有帮助。现在，我用CURL设计所有东西，然后使用Python语句剥离结果以得到我想要的结果。这有助于多级别聚合（子聚合）。

相关问题更多 >

编程相关推荐

热门问题

热门文章