通过 Elasticsearch 实现聚合检索 (分组统计)
2020-09-21 本文已影响0人
觉释
GET test_index/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
}
}
}
}
报错
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "test_index",
"node" : "lt3frTKnQ7aUqcNa4CT_ww",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
}
}
},
"status" : 400
}
错误信息: Set fielddata=true on [xxxx] ......
错误分析: 默认情况下, Elasticsearch 对 text 类型的字段(field)禁用了 fielddata;
text 类型的字段在创建索引时会进行分词处理, 而聚合操作必须基于字段的原始值进行分析;
所以如果要对 text 类型的字段进行聚合操作, 就需要存储其原始值 —— 创建mapping时指定fielddata=true, 以便通过反转倒排索引(即正排索引)将索引数据加载至内存中.
解决方法
解决方案一: 对text类型的字段开启fielddata属性:
将要分组统计的text field(即tags)的fielddata设置为true:
PUT test_index/_mapping/
{
"properties": {
"tags": {
"type": "text",
"fielddata": true
}
}
}
解决方法二: 使用内置keyword字段:
开启fielddata将占用大量的内存.
GET test_index/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags.keyword" // 使用text类型的内置keyword字段
}
}
}
}
先检索, 再聚合
#######(1) 统计name中含有“zhangsan”的中每个tag的文档数量, 请求语法:
GET test_index/_search
{
"query": {
"match": { "name": "zhangsan" }
},
"aggs": {
"group_by_tags": { // 聚合结果的名称, 需要自定义. 下面使用内置的keyword字段:
"terms": { "field": "tags.keyword" }
}
}
}
扩展: fielddata和keyword的聚合比较
为某个 text 类型的字段开启fielddata字段后, 聚合分析操作会对这个字段的所有分词分别进行聚合, 获得的结果大多数情况下并不符合我们的需求.
使用keyword内置字段, 不会对相关的分词进行聚合, 结果可能更有用.
—— 推荐使用text类型字段的内置keyword进行聚合操作.
先分组, 再聚合统计
(1) 先按tags分组, 再计算每个tag下图书的平均价格, 请求语法:
GET test_index/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": { "field": "tags.keyword" },
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
先分组, 组内再分组, 然后统计、排序
(1) 先按价格区间分组, 组内再按tags分组, 计算每个tags组的平均价格, 查询语法:
GET test_index/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{ "from": 00, "to": 100 },
{ "from": 100, "to": 150 }
]
},
"aggs": {
"group_by_tags": {
"terms": { "field": "tags.keyword" },
"aggs": {
"avg_price": {
"avg": { "field": "price" }
}
}
}
}
}
}
}