ES高级查询 - 本文仅作为例子记录用

2022-11-13 本文已影响0人右耳菌

1. Suggesters: 查询建议

1.1 Term suggester

其中的重点不是特别多，关键要记住几个suggest_mode:

missing：
当提供的内容包含了错误，那么其会提供出建议，但是如果没有错误，则是不会提供什么建议的哦。另外这也是缺省情况下默认的建议模式。

# 会返回一个 bristol 
POST /bank/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "text": "bristl",
      "term": {
        "field": "address",
        "suggest_mode": "missing"
      }
    }
  }
}

# 因为这个没有拼写错误，所以不会返回建议
POST /bank/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "text": "bristol",
      "term": {
        "field": "address",
        "suggest_mode": "missing"
      }
    }
  }
}

popular：
表示如果有类似的词且词出现的次数比这个要查询的词出现的次数更高的时候，会将其返回在建议备选列表中。
always：
表示进行模糊查询，只要能匹配上的都进行返回。

1.2 Phrase Suggester

这个可能有点抽象，其实它本质的意思是，它使用了一个短语，然后用于匹配一个索引中的文档，查询其可能的匹配短语，即我可能输入了这样的一段短语I saw your face in a crowde plac，而恰好在songs_v1的索引中的crowde本应是crowded，而plac本应是place，那么我们可能得到以下的返回内容

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "MY_SUGGESTION" : [
      {
        "text" : "I saw your face in a crowde plac",
        "offset" : 0,
        "length" : 32,
        "options" : [
          {
            "text" : "i saw your face in a crowded place",
            "highlighted" : "i saw your face in a <em>crowded place</em>",
            "score" : 4.4348968E-7
          },
          {
            "text" : "i saw your face in a crowded plan",
            "highlighted" : "i saw your face in a <em>crowded plan</em>",
            "score" : 3.7188448E-7
          },
          {
            "text" : "i saw your face in a crowded plac",
            "highlighted" : "i saw your face in a <em>crowded</em> plac",
            "score" : 3.0497068E-7
          },
          {
            "text" : "i saw your face in a crowde place",
            "highlighted" : "i saw your face in a crowde <em>place</em>",
            "score" : 2.9133045E-7
          },
          {
            "text" : "i saw your face in a crowde plan",
            "highlighted" : "i saw your face in a crowde <em>plan</em>",
            "score" : 2.4429264E-7
          }
        ]
      }
    ]
  }
}

例子：

POST /songs_v1/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "text": "I saw your face in a crowde plac",
      "phrase": {
        "field": "lyrics",
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

1.3 Completion suggester: 自动补全

使用了FST数据结构，且使用了前缀查询。

前期准备

# completion suggester
# 自动补全 FST 前缀查询

# 这里的 type:completion 其实会帮助我们提供FST的数据结构
PUT /blogs_completion
{
  "mappings": {
    "tech": {
      "properties": {
        "body": {
          "type":"completion" 
        }
      }
    }
  }
}


POST _bulk/?refresh=true
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Lucene is cool"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Elasticsearch builds on top of lucene"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Elasticsearch rocks"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Elastic is the company behind ELK stack"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "the elk stack rocks"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "elasticsearch is rock build"}

查询例子：

POST /blogs_completion/_search
{
  "size": 0,
  "suggest": {
    "MY_SUGGESTION": {
      "text": "elastic",
      "completion": {
        "field": "body"
      }
    }
  }
}

查询结果

{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "MY_SUGGESTION" : [
      {
        "text" : "elastic",
        "offset" : 0,
        "length" : 7,
        "options" : [
          {
            "text" : "Elastic is the company behind ELK stack",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "JSA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "Elastic is the company behind ELK stack"
            }
          },
          {
            "text" : "Elasticsearch builds on top of lucene",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "IyA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "Elasticsearch builds on top of lucene"
            }
          },
          {
            "text" : "Elasticsearch rocks",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "JCA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "Elasticsearch rocks"
            }
          },
          {
            "text" : "elasticsearch is rock build",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "JyA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "elasticsearch is rock build"
            }
          }
        ]
      }
    ]
  }
}

总结：
查询的精准度上：completion > phrase > term

2. ES 聚合分析简介

2.1 ES聚合分析是什么？

聚合分析是数据库中重要的功能特性，完成对一个查询的数据集中数据的聚合计算，如:找出某字段(或计算表达式的结果）的最大值、最小值，计算和、平均值等。ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。

对一个数据集求最大、最小、和、平均值等指标的聚合，在ES中称为指标聚合 metric
而关系型数据库中除了有聚合函数外，还可以对查询出的数据进行分组group by，再在组上进行指标聚合。在ES中group by称为分桶，桶聚合bucketing

ES中还提供了矩阵聚合（matrix）、管道聚合（pipleline）,但还在完善中。

2.2 ES聚合分析是什么？

聚合分析是数据库中重要的功能特性，完成对一个查询的数据中数据的聚合计算，如：找出某字段（或计算表达式的结果）的最大值、最小值，计算和、平均值等。ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。

对一个数据集求最大、最小、和、平均值等指标的聚合，在ES中称为指标聚合 metric
而关系型数据库中除了有聚合函数外，还可以对查询出的数据进行分组 group by，再在组上进行指标聚合。在ES中group by 称为分桶，桶聚合 bucketing

ES中还提供了矩阵聚合（matrix）、管道聚合（pipeline）。

2.3 ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析：

2.4 聚合分析的值来源

聚合计算的值可以取字段的值，也可以是脚本计算的结果。

2.5 metric 指标聚合

max min avg sum 例子
下边的例子中，如果要切换返回的聚合结果类型，只要把max替换成其他(min、avg、sum)即可：

# max min avg sum
# select max(age) from bank

POST /bank/_search
{
  "size": 0,
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}

# 上边的返回结果其实会带着一堆数据回来，如果不想要文档的数据，可以在后边加上`?size=0`
POST /bank/_search?size=0
{
  "size": 0,
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}

count例子

#计数

# select count(age) from bank

POST /bank/_count
{
  "query" : {
    "match" : {
      "age" : 24
    }
  }
}

###############################

# 第二种方式
POST /bank/_search
{
  "query": {
    "term": {
      "age": {
        "value": 24
      }
    }
  },
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "age"
      }
    }
  }
}

###############################

# 去重计数
# select count(*) from ( select distinct(age) from bank )
POST /bank/_search
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    }
  }
}

对query的结果进行聚合

#对query 后的结果进行聚合
POST /bank/_search
{
  "size": 0,
  "query": {
    "match": {
      "address": "street"
    }
  },
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}

对参数进行计算后再进行聚合(聚合脚本值)

#聚合 脚本值
# select avg(age + 1)  from bank;
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age_yearlater": {
      "avg": {
        "script": "doc.age.value + 1"
      }
    }
  }
}

#指定field，用value取值
POST /bank/_search?size=0
{
  "aggs": {
    "max_age": {
      "avg": {
        "field": "age",
        "script": {
          "source" : "_value * 2"
        }
      }
    }
  }
}

占比，即可以快速知道某个值在索引中的占比
记得percentile_ranks 这个是小于等于的意思，如下

# 让用户可以快速知道每个年龄段用户占比多少
# percentile_ranks 根据年龄段找百分比
GET /bank/_search?size=0
{
  "aggs": {
    "TEST_NAME": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          10,
          24,
          40
        ]
      }
    }
  }
}

# 根据百分比找年龄
#Percentiles
GET /bank/_search?size=0
{
  "aggs": {
    "TEST_NAME": {
      "percentiles": {
        "field": "age",
        "percents": [
          1,
          5,
          25,
          50,
          75,
          95,
          99
        ]
      }
    }
  }
}

对于指标聚合，某些文档的某个字段可能是缺失的，如某个文档的age是没有或者是空的，那么对于缺失值的处理要如何进行呢？其实只要加上一个“missing”即可，如下：

POST /bank/_search?size=0
{
  "size": 0,
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 30
      }
    }
  }
}

2.6 Bucket 桶聚合

直接上例子吧

桶聚合后再进行聚合 - 用户最年轻的10种状态

# 1. 用户最年轻的10种状态
GET /bank/_search?size=0
{
  "aggs": {
    "states": { # 业务名称
      "terms": { # 聚合类型
        "field": "state.keyword", # 根据什么来分组
        "order": { # 排序
          "avg_age": "asc"
        },
        "size": 10 # 指定返回的条目数量
      },
      "aggs": { # 再对分组内的内容进行聚合
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

bucket_selecter - 平均年龄小于30岁的状态

# 2. 平均年龄小于30岁的状态
# bucket_selecter
GET /bank/_search?size=0
{
  "aggs": {
    "states": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "filter_bucket": { #添加过滤
          "bucket_selector": { # bucket_selecter 的写法
            "buckets_path": {
              "avg_age": "avg_age"
            },
            "script": "params.avg_age < 30"
          }
        }
      }
    }
  }
}

filters agg，年龄为20、21、22岁的用户，分别有多少？

#4. 年龄为20、21、22岁的用户，分别有多少？
# filters aggs
GET /bank/_search?size=0
{
  "aggs": {
    "20_21_22": {
      "filters": {
        "filters": {
          "20":{
            "term": {"age":20}
          },
          "21":{
            "term": {"age":21}
          },
          "22":{
            "term": {"age":22}
          }
        }
      }
    }
  }
}

rang agg，每个年龄段的男生用户多，还是女生用户多

# 5. 每个年龄段的男生用户多，还是女生用户多
# 第一种解决办法
POST /bank/_search?size=0
{
  "aggs": {
    "20_21_22": {
      "filters": {
        "filters": {
          "10~20": {"range": {"age": {"gte": 10,"lte": 20}}},
          "20~30": {"range": {"age": {"gte": 20,"lte": 30}}},
          "30~40": {"range": {"age": {"gte": 30,"lte": 40}}}
        }
      }
    }
  }
}

# 第二种解决办法
# rang aggs
POST /bank/_search?size=0
{
  "aggs": {
    "age_rangs": {
      "range": {
        "field": "age",
        "ranges": [
          {"from": 10,"to": 20},
          {"from": 20,"to": 30},
          {"from": 30,"to": 40},
          {"from": 40,"to": 50}
        ]
      },
      "aggs": {
        "genders": {
          "terms": {
            "field": "gender.keyword"
          }
        }
      }
    }
  }
}

date_range agg - 最近一个月，注册了多少用户

# 最近一个月，注册了多少用户
POST /bank/_search?size=0
{
  "aggs": {
    "month_recent": {
      "date_range": {
        "field": "registered",
        "ranges": [
          {
            "from": "now-1M/M",
            "to": "now"
          }
        ]
      }
    }
  }
}

date_histogram - 每个月注册用户量的曲线图

#每个月注册用户量的曲线图
POST /bank/_search?size=0
{
  "aggs": {
    "MY_NAME": {
      "date_histogram": {
        "field": "registered",
        "interval": "month"
      }
    }
  }
}

是否有人在同一座城市

#是否有人在同一个城市
GET /bank/_search?size=0
{
  "aggs": {
    "same_city": {
      "terms": {
        "field": "city.keyword",
        "size": 10,
        "order": {
          "d_count": "desc"
        }
      },
      "aggs": {
        "d_count": {
          "value_count": {
            "field": "age"
          }
        }
      }
    }
  }
}

如果觉得有收获就点个赞吧，更多知识，请点击关注查看我的主页信息哦~