ElasticSearch-聚合
什么是聚合
每个聚合都是一个或者多个桶和零个或者多个指标的组合。
桶(Buckets)
满足特定条件的文档的集合。
指标(Metrics)
对桶内的文档进行统计计算。
聚合语法结构
"aggregation" : {
"<aggregation_name>" : {
"aggregation_type" : {
<aggregation>
}
[, "meta" : {[<meta_data_body>]}]?
[, "aggregation" : {[<sub_aggregation>]}]?
}
[, "<aggregation_name_2>" : {...}]*
}
Bucket Aggregation-分桶
Filter Aggregation -- 过滤分桶
Filters Aggregation -- 过滤分桶
Date Histogram Aggregation -- 按照日期自动划分桶
Date Range Aggregation -- 给定日期范围划分
Histogram Aggregation -- 直方图划分桶
Range Aggregation -- 给定范围划分桶
IP Range Aggregation -- 按照给定ip范围分桶
Terms Aggregatioon -- 按照最多的词条分桶
Geo Distance Aggregation -- 按地理位置指定的中心点园环分桶
GeoHash grid Aggregation -- 按geohash单元分桶
Metrics Aggregation-指标
Avg Aggregation -- 平均值
Max Aggregation -- 最大值
Min Aggregation -- 最小值
Sum Aggregation -- 求和
Cardinality Aggregation -- 基数(去重值)
Percentiles Aggregation -- 百分位
Percentile Ranks Aggregation -- 百分位排名
Stats Aggregation -- 统计(包含min、max、sum、avg)
Geo Bounds Aggregation -- 地理坐标边框
Geo Centroid Aggregation -- 图心
初始化数据
DELETE cars
PUT cars
{
"mappings": {
"transactions": {
"properties": {
"price": {
"type":"long"
},
"color": {
"type":"keyword"
},
"make": {
"type":"keyword"
},
"sold": {
"type":"date"
}
}
}
}
}
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
select count(color) from cars group by color;
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"popular_colors" : {
"terms" : {
"field" : "color"
}
}
}
}
每种颜色汽车的平均价格是多少?
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
每个颜色的汽车制造商的分布
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"make": {
"terms": {
"field": "make"
}
}
}
}
}
}
为每个汽车生成商计算最低和最高的价格
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" }
},
"make" : {
"terms" : {
"field" : "make"
},
"aggs" : {
"min_price" : { "min": { "field": "price"} },
"max_price" : { "max": { "field": "price"} }
}
}
}
}
}
}
直方图
GET /cars/transactions/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 20000
},
"aggs":{
"price_sum": {
"sum": {
"field" : "price"
}
}
}
}
}
}
最受欢迎 10 种汽车以及它们的平均售价、标准差
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"makes": {
"terms": {
"field": "make",
"size": 10
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
时间条形图
今年每月销售多少台汽车?
这只股票最近 12 小时的价格是多少?
我们网站上周每小时的平均响应延迟时间是多少?
每月销售多少台汽车
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold",
"interval": "month",
"format": "yyyy-MM-dd"
}
}
}
}
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2014-01-01",
"max" : "2014-12-31"
}
}
}
}
}
同时按季度、按每个汽车品牌计算销售总额,以便可以找出哪种品牌最赚钱:
间间隔从 month 改成了 quarter
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2014-01-01",
"max" : "2014-12-31"
}
},
"aggs": {
"per_make_sum": {
"terms": {
"field": "make"
},
"aggs": {
"sum_price": {
"sum": { "field": "price" }
}
}
},
"total_sum": {
"sum": { "field": "price" }
}
}
}
}
}
查询某一个范围的聚合
GET /cars/transactions/_search
{
"size": 0,
"query" : {
"match" : {
"make" : "ford"
}
},
"aggs" : {
"colors" : {
"terms" : {
"field" : "color"
}
}
}
}
全局桶,query只是对single_avg_price起作用
GET /cars/transactions/_search
{
"size" : 0,
"query" : {
"match" : {
"make" : "ford"
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" }
},
"all": {
"global" : {},
"aggs" : {
"avg_price": {
"avg" : { "field" : "price" }
}
}
}
}
}
过滤
GET /cars/transactions/_search
{
"size" : 0,
"query" : {
"constant_score": {
"filter": {
"range": {
"price": {
"gte": 10000
}
}
}
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" }
}
}
}
可以指定一个过滤桶,当文档满足过滤桶的条件时,将其加入到桶内
不过滤搜索结果,对聚合结果进行过滤
GET /cars/transactions/_search
{
"size" : 0,
"query":{
"match": {
"make": "ford"
}
},
"aggs":{
"recent_sales": {
"filter": {
"range": {
"sold": {
"from": "now-1M"
}
}
},
"aggs": {
"average_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
只过滤搜索结果,不过滤聚合结果(注意hits.total的变化)
GET /cars/transactions/_search
{
"size" : 0,
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color" }
}
}
}
内置排序
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"_count" : "asc"
}
}
}
}
}
_count
按文档数排序。对 terms 、 histogram 、 date_histogram 有效。
_term
按词项的字符串值的字母顺序排序。只在 terms 内使用。
_key
按每个桶的键值数值排序(理论上与 _term 类似)。 只在 histogram 和 date_histogram 内使用。
按照汽车颜色创建一个销售条状图表,但按照汽车平均售价的升序进行排序
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"avg_price" : "asc"
}
},
"aggs": {
"avg_price": {
"avg": {"field": "price"}
}
}
}
}
}
多值度量排序
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color",
"order": {
"stats.variance" : "asc"
}
},
"aggs": {
"stats": {
"extended_stats": {"field": "price"}
}
}
}
}
}
嵌套度量
stats 度量是 red_green_cars 聚合的子节点,而 red_green_cars 又是 colors 聚合的子节点
嵌套路径上的每个桶都必须是单值的,度量用尖括号( > )嵌套起来
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"histogram" : {
"field" : "price",
"interval": 20000,
"order": {
"red_green_cars>stats.variance" : "asc"
}
},
"aggs": {
"red_green_cars": {
"filter": { "terms": {"color": ["red", "green"]}},
"aggs": {
"stats": {"extended_stats": {"field" : "price"}}
}
}
}
}
}
}
ES有两种近似算法( cardinality 和 percentiles ),它们会提供准确但不是 100% 精确的结果。 以牺牲一点小小的估算错误为代价,这些算法可以为我们换来高速的执行效率和极小的内存消耗。
近似计算的结果会在毫秒内返回,而“完全正确”的结果就可能需要几秒,甚至无法返回。
销售汽车颜色的数量
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"distinct_colors" : {
"cardinality" : {
"field" : "color"
}
}
}
}
precision_threshold 接受 0–40,000 之间的数字,更大的值还是会被当作 40,000 来处理
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"distinct_colors" : {
"cardinality" : {
"field" : "color",
"precision_threshold" : 100
}
}
}
}
百分位度量
DELETE website
PUT website
{
"mappings": {
"logs": {
"properties": {
"latency": {
"type":"long"
},
"zone": {
"type":"keyword"
},
"timestamp": {
"type":"date"
}
}
}
}
}
POST /website/logs/_bulk
{ "index": {}}
{ "latency" : 100, "zone" : "US", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 80, "zone" : "US", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 99, "zone" : "US", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 102, "zone" : "US", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 75, "zone" : "US", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 82, "zone" : "US", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 100, "zone" : "EU", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 280, "zone" : "EU", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 155, "zone" : "EU", "timestamp" : "2014-10-29" }
{ "index": {}}
{ "latency" : 623, "zone" : "EU", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 380, "zone" : "EU", "timestamp" : "2014-10-28" }
{ "index": {}}
{ "latency" : 319, "zone" : "EU", "timestamp" : "2014-10-29" }
查看平均响应延迟时间,百分位时间
GET /website/logs/_search
{
"size" : 0,
"aggs" : {
"load_times" : {
"percentiles" : {
"field" : "latency"
}
},
"avg_load_time" : {
"avg" : {
"field" : "latency"
}
}
}
}
查看延迟时间跟地理位置的延迟是否有关的百分位
GET /website/logs/_search
{
"size" : 0,
"aggs" : {
"zones" : {
"terms" : {
"field" : "zone"
},
"aggs" : {
"load_times" : {
"percentiles" : {
"field" : "latency",
"percents" : [50, 95.0, 99.0]
}
},
"load_avg" : {
"avg" : {
"field" : "latency"
}
}
}
}
}
}
查看某一个值属于哪个百分位(percentile_ranks)
GET /website/logs/_search
{
"size" : 0,
"aggs" : {
"zones" : {
"terms" : {
"field" : "zone"
},
"aggs" : {
"load_times" : {
"percentile_ranks" : {
"field" : "latency",
"values" : [210, 800]
}
}
}
}
}
}