我想从Elasticsearch检索一个字段及其规范化版本
这是我的索引定义和数据
PUT normalizersample
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"refresh_interval": "60s",
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": [
"lowercase",
"german_normalization",
"asciifolding"
],
"type": "custom"
}
}
}
},
"mappings": {
"_source": {
"enabled": true
},
"properties": {
"myField": {
"type": "text",
"store": true,
"fields": {
"keyword": {
"type": "keyword",
"store": true
},
"normalized": {
"type": "keyword",
"store": true,
"normalizer": "my_normalizer"
}
}
}
}
}
}
POST normalizersample/_doc/1
{
"myField": ["Andreas", "Ämdreas", "Anders"]
}
我的第一种方法是使用脚本字段,如
GET /myIndex/_search
{
"size": 100,
"query": {
"match_all": {}
},
"script_fields": {
"keyword": {
"script": "doc['myField.keyword']"
},
"normalized": {
"script": "doc['myField.normalized']"
}
}
}
但是,由于myField是一个数组,因此每个ES文档返回两个字符串列表,并且每个字符串都按字母顺序排序。因此,由于规范化,相应的条目可能彼此不匹配
"hits" : [
{
"_index" : "normalizersample",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"de" : [
"amdreas",
"anders",
"andreas"
],
"keyword" : [
"Anders",
"Andreas",
"Ämdreas"
]
}
}
]
虽然我想检索[(Andreas,Andreas),(Ämdreas,amdreas)(Anders,Anders)]或类似的格式,我可以将每个条目与其规范化匹配。 我发现的唯一方法是在两个字段上调用术语向量,因为它们都包含位置字段,但这对我来说似乎是一个巨大的开销。(https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html)
有没有更简单的方法来检索带有关键字和规范化字段的元组
非常感谢
目前没有回答
相关问题 更多 >
编程相关推荐