执行此查询时,我正在使用Elasticsearch 7.9.0版:
curl -XGET 'https:somehost:9200/index_name/_search' -H 'Content-Type: application/json' -d '{
"size": 10,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.query_vector, \u0027title_embed\u0027) + 1.0",
"params": {
"query_vector": [-0.19277021288871765, 0.10494251549243927,.......]}
}
}
}
}'
注意:query_vector
是由Bert生成的768维向量。
注意:\u0027
是单引号的Unicode
我得到了这个错误的回应:
"cosineSimilarity(params.query_vector, 'title_embed') + 1.0","
^---- HERE"],"script":"cosineSimilarity(params.query_vector, 'title_embed') +
1.0","lang":"painless","position":{"offset":38,"start":0,"end":58},"caused_by":
{"type":"class_cast_exception","reason":"class
org.elasticsearch.index.fielddata.ScriptDocValues$Doubles cannot be cast to class
org.elasticsearch.xpack.vectors.query.VectorScriptDocValues$DenseVectorScriptDocValues
(org.elasticsearch.index.fielddata.ScriptDocValues$Doubles is in unnamed module of loader 'app';
org.elasticsearch.xpack.vectors.query.VectorScriptDocValues$DenseVectorScriptDocValues is in
unnamed module of loader java.net.FactoryURLClassLoader @715fb77)"}}}]},"status":400}
虽然索引映射中的title_embed
的数据类型是Elasticsearch的dense_vector
类型,但错误表明它是双精度的,我不知道为什么
以下是映射:
"mappings": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"domain": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"link": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"pub_date": {
"type": "date"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title_embed": {
"type": "dense_vector",
"dims": 768
},
"description_embed": {
"type": "dense_vector",
"dims": 768
}
}
}
当我尝试使用python执行此查询时,我收到了相同的错误:
status_code, error_message, additional_info
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', "class_cast_exception: class org.elasticsearch.index.fielddata.ScriptDocValues$Doubles cannot be cast to class org.elasticsearch.xpack.vectors.query.VectorScriptDocValues$DenseVectorScriptDocValues (org.elasticsearch.index.fielddata.ScriptDocValues$Doubles is in unnamed module of loader 'app'; org.elasticsearch.xpack.vectors.query.VectorScriptDocValues$DenseVectorScriptDocValues is in unnamed module of loader java.net.FactoryURLClassLoader @6d91790b)")
如果可能,检查变量数量是否等于映射中的维度数量,即
dims:768
“查询向量”中的值数是否等于768
我建议再次检查映射,通过运行以下命令查看映射是否良好:
此外,在传递“query_vector”时,您可能遗漏了一个值
我做了一个局部测试,但是,向量是3维的
标题_嵌入的映射为3,类型为“稠密_向量”
我在映射中摄取了一些数据,如下所示:
我尝试用较低的向量维度复制您的查询,如上所述:
注意:正如Tom Elias提到的,运行doc['title_embed']可以工作,但在7.9.0版中不推荐使用
一个小小的建议是,当在映射的同时摄取索引中的数据时,是否可以尝试通过减少向量维度来降低维度。如果维度数为5,则检查映射中的“dim”值是否为5,同时将数据摄取到索引和“query_vector”中
如果这不起作用,我想可能对允许的维度数量有一个内部限制
有用链接: https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-script-score-query.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
相关问题 更多 >
编程相关推荐