ES高级查询 - 本文仅作为例子记录用

2022-11-13  本文已影响0人  右耳菌

1. Suggesters: 查询建议

1.1 Term suggester

其中的重点不是特别多,关键要记住几个suggest_mode:

# 会返回一个 bristol 
POST /bank/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "text": "bristl",
      "term": {
        "field": "address",
        "suggest_mode": "missing"
      }
    }
  }
}

# 因为这个没有拼写错误,所以不会返回建议
POST /bank/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "text": "bristol",
      "term": {
        "field": "address",
        "suggest_mode": "missing"
      }
    }
  }
}
1.2 Phrase Suggester

这个可能有点抽象,其实它本质的意思是,它使用了一个短语,然后用于匹配一个索引中的文档,查询其可能的匹配短语,即我可能输入了这样的一段短语I saw your face in a crowde plac,而恰好在songs_v1的索引中的crowde本应是crowded,而plac本应是place,那么我们可能得到以下的返回内容
{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "MY_SUGGESTION" : [
      {
        "text" : "I saw your face in a crowde plac",
        "offset" : 0,
        "length" : 32,
        "options" : [
          {
            "text" : "i saw your face in a crowded place",
            "highlighted" : "i saw your face in a <em>crowded place</em>",
            "score" : 4.4348968E-7
          },
          {
            "text" : "i saw your face in a crowded plan",
            "highlighted" : "i saw your face in a <em>crowded plan</em>",
            "score" : 3.7188448E-7
          },
          {
            "text" : "i saw your face in a crowded plac",
            "highlighted" : "i saw your face in a <em>crowded</em> plac",
            "score" : 3.0497068E-7
          },
          {
            "text" : "i saw your face in a crowde place",
            "highlighted" : "i saw your face in a crowde <em>place</em>",
            "score" : 2.9133045E-7
          },
          {
            "text" : "i saw your face in a crowde plan",
            "highlighted" : "i saw your face in a crowde <em>plan</em>",
            "score" : 2.4429264E-7
          }
        ]
      }
    ]
  }
}

例子:

POST /songs_v1/_search
{
  "suggest": {
    "MY_SUGGESTION": {
      "text": "I saw your face in a crowde plac",
      "phrase": {
        "field": "lyrics",
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}
1.3 Completion suggester: 自动补全

使用了FST数据结构,且使用了前缀查询。

# completion suggester
# 自动补全 FST 前缀查询

# 这里的 type:completion 其实会帮助我们提供FST的数据结构
PUT /blogs_completion
{
  "mappings": {
    "tech": {
      "properties": {
        "body": {
          "type":"completion" 
        }
      }
    }
  }
}


POST _bulk/?refresh=true
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Lucene is cool"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Elasticsearch builds on top of lucene"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Elasticsearch rocks"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "Elastic is the company behind ELK stack"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "the elk stack rocks"}
{"index": { "_index":"blogs_completion", "_type": "tech"}}
{"body": "elasticsearch is rock build"}
POST /blogs_completion/_search
{
  "size": 0,
  "suggest": {
    "MY_SUGGESTION": {
      "text": "elastic",
      "completion": {
        "field": "body"
      }
    }
  }
}
{
  "took" : 20,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "MY_SUGGESTION" : [
      {
        "text" : "elastic",
        "offset" : 0,
        "length" : 7,
        "options" : [
          {
            "text" : "Elastic is the company behind ELK stack",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "JSA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "Elastic is the company behind ELK stack"
            }
          },
          {
            "text" : "Elasticsearch builds on top of lucene",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "IyA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "Elasticsearch builds on top of lucene"
            }
          },
          {
            "text" : "Elasticsearch rocks",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "JCA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "Elasticsearch rocks"
            }
          },
          {
            "text" : "elasticsearch is rock build",
            "_index" : "blogs_completion",
            "_type" : "tech",
            "_id" : "JyA5cIQBEjlPnVjPKJpo",
            "_score" : 1.0,
            "_source" : {
              "body" : "elasticsearch is rock build"
            }
          }
        ]
      }
    ]
  }
}

总结:
查询的精准度上:completion > phrase > term


2. ES 聚合分析简介

2.1 ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。

2.2 ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

ES中还提供了矩阵聚合(matrix)、管道聚合(pipeline)。

2.3 ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析:


2.4 聚合分析的值来源

聚合计算的值可以取字段的值,也可以是脚本计算的结果。

2.5 metric 指标聚合
# max min avg sum
# select max(age) from bank

POST /bank/_search
{
  "size": 0,
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}

# 上边的返回结果其实会带着一堆数据回来,如果不想要文档的数据,可以在后边加上`?size=0`
POST /bank/_search?size=0
{
  "size": 0,
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}
#计数

# select count(age) from bank

POST /bank/_count
{
  "query" : {
    "match" : {
      "age" : 24
    }
  }
}

###############################

# 第二种方式
POST /bank/_search
{
  "query": {
    "term": {
      "age": {
        "value": 24
      }
    }
  },
  "aggs": {
    "age_count": {
      "value_count": {
        "field": "age"
      }
    }
  }
}

###############################

# 去重计数
# select count(*) from ( select distinct(age) from bank )
POST /bank/_search
{
  "aggs": {
    "age_count": {
      "cardinality": {
        "field": "age"
      }
    }
  }
}
#对query 后的结果进行聚合
POST /bank/_search
{
  "size": 0,
  "query": {
    "match": {
      "address": "street"
    }
  },
  "aggs": {
    "max_age": {
      "max": {
        "field": "age"
      }
    }
  }
}
#聚合 脚本值
# select avg(age + 1)  from bank;
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age_yearlater": {
      "avg": {
        "script": "doc.age.value + 1"
      }
    }
  }
}

#指定field,用value取值
POST /bank/_search?size=0
{
  "aggs": {
    "max_age": {
      "avg": {
        "field": "age",
        "script": {
          "source" : "_value * 2"
        }
      }
    }
  }
}

# 让用户可以快速知道每个年龄段用户占比多少
# percentile_ranks 根据年龄段找百分比
GET /bank/_search?size=0
{
  "aggs": {
    "TEST_NAME": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          10,
          24,
          40
        ]
      }
    }
  }
}
# 根据百分比找年龄
#Percentiles
GET /bank/_search?size=0
{
  "aggs": {
    "TEST_NAME": {
      "percentiles": {
        "field": "age",
        "percents": [
          1,
          5,
          25,
          50,
          75,
          95,
          99
        ]
      }
    }
  }
}
POST /bank/_search?size=0
{
  "size": 0,
  "aggs": {
    "avg_age": {
      "avg": {
        "field": "age",
        "missing": 30
      }
    }
  }
}
2.6 Bucket 桶聚合

直接上例子吧

# 1. 用户最年轻的10种状态
GET /bank/_search?size=0
{
  "aggs": {
    "states": { # 业务名称
      "terms": { # 聚合类型
        "field": "state.keyword", # 根据什么来分组
        "order": { # 排序
          "avg_age": "asc"
        },
        "size": 10 # 指定返回的条目数量
      },
      "aggs": { # 再对分组内的内容进行聚合
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}
# 2. 平均年龄小于30岁的状态
# bucket_selecter
GET /bank/_search?size=0
{
  "aggs": {
    "states": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "filter_bucket": { #添加过滤
          "bucket_selector": { # bucket_selecter 的写法
            "buckets_path": {
              "avg_age": "avg_age"
            },
            "script": "params.avg_age < 30"
          }
        }
      }
    }
  }
}
#4. 年龄为20、21、22岁的用户,分别有多少?
# filters aggs
GET /bank/_search?size=0
{
  "aggs": {
    "20_21_22": {
      "filters": {
        "filters": {
          "20":{
            "term": {"age":20}
          },
          "21":{
            "term": {"age":21}
          },
          "22":{
            "term": {"age":22}
          }
        }
      }
    }
  }
}
# 5. 每个年龄段的男生用户多,还是女生用户多
# 第一种解决办法
POST /bank/_search?size=0
{
  "aggs": {
    "20_21_22": {
      "filters": {
        "filters": {
          "10~20": {"range": {"age": {"gte": 10,"lte": 20}}},
          "20~30": {"range": {"age": {"gte": 20,"lte": 30}}},
          "30~40": {"range": {"age": {"gte": 30,"lte": 40}}}
        }
      }
    }
  }
}

# 第二种解决办法
# rang aggs
POST /bank/_search?size=0
{
  "aggs": {
    "age_rangs": {
      "range": {
        "field": "age",
        "ranges": [
          {"from": 10,"to": 20},
          {"from": 20,"to": 30},
          {"from": 30,"to": 40},
          {"from": 40,"to": 50}
        ]
      },
      "aggs": {
        "genders": {
          "terms": {
            "field": "gender.keyword"
          }
        }
      }
    }
  }
}
# 最近一个月,注册了多少用户
POST /bank/_search?size=0
{
  "aggs": {
    "month_recent": {
      "date_range": {
        "field": "registered",
        "ranges": [
          {
            "from": "now-1M/M",
            "to": "now"
          }
        ]
      }
    }
  }
}
#每个月注册用户量的曲线图
POST /bank/_search?size=0
{
  "aggs": {
    "MY_NAME": {
      "date_histogram": {
        "field": "registered",
        "interval": "month"
      }
    }
  }
}
#是否有人在同一个城市
GET /bank/_search?size=0
{
  "aggs": {
    "same_city": {
      "terms": {
        "field": "city.keyword",
        "size": 10,
        "order": {
          "d_count": "desc"
        }
      },
      "aggs": {
        "d_count": {
          "value_count": {
            "field": "age"
          }
        }
      }
    }
  }
}

如果觉得有收获就点个赞吧,更多知识,请点击关注查看我的主页信息哦~

上一篇下一篇

猜你喜欢

热点阅读