通过 Elasticsearch 实现聚合检索 (分组统计)

2020-09-21 本文已影响0人觉释

GET test_index/_search
{
    "size": 0,           
    "aggs": {
        "group_by_tags": { 
            "terms": {
                "field": "tags"
            }
        }
    }
}

报错

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "test_index",
        "node" : "lt3frTKnQ7aUqcNa4CT_ww",
        "reason" : {
          "type" : "illegal_argument_exception",
          "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
        }
      }
    ],
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory.",
      "caused_by" : {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [tags] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    }
  },
  "status" : 400
}

错误信息: Set fielddata=true on [xxxx] ......
错误分析: 默认情况下, Elasticsearch 对 text 类型的字段(field)禁用了 fielddata;
text 类型的字段在创建索引时会进行分词处理, 而聚合操作必须基于字段的原始值进行分析;
所以如果要对 text 类型的字段进行聚合操作, 就需要存储其原始值 —— 创建mapping时指定fielddata=true, 以便通过反转倒排索引(即正排索引)将索引数据加载至内存中.

解决方法

解决方案一: 对text类型的字段开启fielddata属性:

将要分组统计的text field(即tags)的fielddata设置为true:

PUT test_index/_mapping/
{
    "properties": {
        "tags": {
            "type": "text",
            "fielddata": true
        }
    }
}

解决方法二: 使用内置keyword字段:

开启fielddata将占用大量的内存.

GET  test_index/_search
{
   "size": 0,
   "aggs": {
       "group_by_tags": {
           "terms": {
               "field": "tags.keyword" // 使用text类型的内置keyword字段
           }
       }
   }
}

先检索, 再聚合

#######(1) 统计name中含有“zhangsan”的中每个tag的文档数量, 请求语法:

GET test_index/_search
{
    "query": {
        "match": { "name": "zhangsan" }
    }, 
    "aggs": {
        "group_by_tags": {  // 聚合结果的名称, 需要自定义. 下面使用内置的keyword字段: 
            "terms": { "field": "tags.keyword" }
        }
    }
}

扩展: fielddata和keyword的聚合比较
为某个 text 类型的字段开启fielddata字段后, 聚合分析操作会对这个字段的所有分词分别进行聚合, 获得的结果大多数情况下并不符合我们的需求.

使用keyword内置字段, 不会对相关的分词进行聚合, 结果可能更有用.

—— 推荐使用text类型字段的内置keyword进行聚合操作.

先分组, 再聚合统计

(1) 先按tags分组, 再计算每个tag下图书的平均价格, 请求语法:

GET test_index/_search
{
    "size": 0, 
    "aggs": {
        "group_by_tags": {
            "terms": { "field": "tags.keyword" },
            "aggs": {
                "avg_price": {
                    "avg": { "field": "price" }
                }
            }
        }
    }
}

先分组, 组内再分组, 然后统计、排序

(1) 先按价格区间分组, 组内再按tags分组, 计算每个tags组的平均价格, 查询语法:

GET test_index/_search
{
    "size": 0, 
    "aggs": {
        "group_by_price": {
            "range": {
                "field": "price", 
                "ranges": [
                    { "from": 00,  "to": 100 },
                    { "from": 100, "to": 150 }
                ]
            }, 
            "aggs": {
                "group_by_tags": {
                    "terms": { "field": "tags.keyword" }, 
                    "aggs": {
                        "avg_price": {
                            "avg": { "field": "price" }
                        }
                    }
                }
            }
        }
    }
}