ElasticSearcheses

elasticsearch基本概念与查询语法

2017-08-11  本文已影响3275人  Hqmm

序言

后面有大量类似于mysqlsum, group by查询
elk
===

elk总体架构

https://www.elastic.co/cn/products

Beat

基于go语言写的轻量型数据采集器,读取数据,迅速发送到Logstash进行解析,亦或直接发送到Elasticsearch进行集中式存储和分析。

Logstash

Logstash 是开源的服务器端数据处理管道,能够同时从多个来源采集数据、格式化数据,然后将数据发送到es进行存储。

ElasticSearch

Elasticsearch 是基于JSON的分布式搜索和分析引擎,是利用倒排索引实现的全文索引。

Kibana
Kibana 能够可视化 Elasticsearch 中的数据并操作。

elasticsearch

es在elk生态圈中处于核心地位,是开源大规模基于倒排索引的全文搜索分析引擎,他几乎能实时的支持存储搜索分析。
优势:

基本概念

elasticsearch查询语法

_cat API

查询当前es集群的相关消息,包括集群中的index数量、运行状态、当前集群所在的ip,目的在于将查询的结果以更加友好的方式输出。

Search APIs

搜索数据,查询语法多,功能强大
REST request URI: 轻便快速的URI查询方法
REST request body: 可以有许多限制条件的json格式查询方法

"script_fields": {
    "FIELD": {# 指定脚本计算之后值得名称
      "script": {# 脚本内的运算
      }
    }
  }
"aggs": {
    "NAME": {# 指定结果的名称
      "AGG_TYPE": {# 指定具体的聚合方法,
        TODO: # 聚合体内制定具体的聚合字段
      }
    }
    TODO: # 该处可以嵌套聚合
  }

Query DSL

Query DSL是es提供的一套完整的基于json格式的结构化查询方法,包含两类不同的查询语义:

Query and filter context

查询语句的行为取决于它是使用查询型上下文还是过滤型上下文

elasticsearch查询示例

_cat api查询示例

_cat查询当前es集群运行的状况

Kibana’s Console: `GET /_cat/health?v`
curl: `curl -XGET "127.0.0.1:9200/_cat/health?v"`

_cat查询当前es集群中所有的indices

Kibana’s Console: `GET /_cat/indices?v`
curl: `curl -XGET "127.0.0.1:9200/_cat/indices?v"`

_search api查询示例

创建index

PUT /customer?pretty

output:

{
  "acknowledged": true,
  "shards_acknowledged": true
}

插入数据
日常任务中,有时候往es插入数据的时候会出现504网关超时,这时候就需要手动的插入少量数据

PUT /rta_daily_report/campaign/164983850_rba_20170808?pretty
{
  "doc": {
    "cid": 164983850,
    "advertiser_id": 799,
    "trace_app_id": "com.zeptolab.cats.google",
    "network_cid": "6656665",
    "platform": 1,
    "direct": 2,
    "last_second_domain": "",
    "jump_type": 2,
    "direct_trace_app_id": "",
    "mode": 0,
    "third": "kuaptrk.com",
    "hops": 9,
    "yyyymmdd": "2017-08-07T16:00:00",
    "type": "rba",
    "click": 2
  }
}

output:

{
  "_index": "rta_daily_report",
  "_type": "campaign",
  "_id": "164983851_rba_20170808",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}

删除数据
指定document_id删除:

DELETE /rta_daily_report/campaign/164983850_rba_20170808?pretty

query中满足一定条件删除

POST rta_daily_report/_delete_by_query
{
  "query": { 
    "match": {
      "message": "some message"
    }
  }
}

根据具体document_id查询

GET rta_daily_report/campaign/145603275_m_normal_20170804?pretty

output:

{
  "_index": "rta_daily_report",
  "_type": "campaign",
  "_id": "145603275_m_normal_20170804",
  "_version": 1,
  "found": true,
  "_source": {
    "cid": 145603275,
    "advertiser_id": 457,
    "trace_app_id": "id1105855019",
    "network_cid": "plr_gs_ios_cn_osv9",
    "platform": 2,
    "direct": 1,
    "last_second_domain": "tracking.lenzmx.com",
    "jump_type": 7,
    "direct_trace_app_id": "id1105855019",
    "mode": 3,
    "third": "3444.tlnk.io",
    "hops": 1,
    "yyyymmdd": "2017-08-03T16:00:00",
    "type": "m_normal",
    "click": 2,
    "impression": 3,
    "revenue": 0,
    "install": 0
  }
}

查询所有数据
URI:

GET rta_daily_report/campaign/_search?q=*&pretty

request boy:

GET rta_daily_report/campaign/_search
{
  "query": {
    "match_all": {}
  }  
}

output:

"hits": {
    "total": 2705059,
    "max_score": 1,
    "hits": [
      {
        "_index": "rta_daily_report",
        "_type": "campaign",
        "_id": "163016610_rba_20170801",
        "_score": 1,
        "_source": {
          "cid": 163016610,
          "advertiser_id": 799,
          "trace_app_id": "mappstreet.videoeditor",
          "network_cid": "6287283",
          "platform": 1,
          "direct": 2,
          "last_second_domain": "",
          "jump_type": 2,
          "direct_trace_app_id": "",
          "mode": 0,
          "third": "aff.adsbreak.com",
          "hops": 8,
          "yyyymmdd": "2017-07-31T16:00:00",
          "type": "rba",
          "click": 0
        }
      },
      ....]
      }

查询特定字段,并且指定排序字段
在indices为rta_daily_report中搜索type:rba,以日期升序输出1个查询结果
URI:

 GET rta_daily_report/_search?q=type:rba&sort=yyyymmdd:asc&pretty

request bofy:

GET rta_daily_report/_search
{
  "query": {
    "match": {
      "type": "rba"
    }
  },
  "sort": [
    {
      "yyyymmdd": {
        "order": "desc"
      }
    }
  ]
}

指定输出字段
查询类型为rba/b2t,按照日期降序排列,输出制定字段,并且只输出5条查询结果,如果要匹配段落,则用"match_phrase": { "address": "mill lane" }

GET rta_daily_report/_search
{
  "query": {
    "match": {
      "type": "rba b2t"
    }
  },
  "sort": [
    {
      "yyyymmdd": {
        "order": "desc"
      }
    }
  ],
  "_source": ["yyyymmdd", "type", "cid", "click", "revenue"],
  "size": 5
} 

output:

"hits": {
    "total": 1327184,
    "max_score": null,
    "hits": [
      {
        "_index": "rta_daily_report",
        "_type": "campaign",
        "_id": "54870921_b2t_20170804",
        "_score": null,
        "_source": {
          "revenue": 76500,
          "yyyymmdd": "2017-08-03T16:00:00",
          "type": "b2t",
          "click": 22616,
          "cid": 54870921
        },
        "sort": [
          1501776000000
        ]
      },

bool组合复杂查询
下例是查询类型为b2t,收入必须大于0的所有单子的click、revenue相关数据

GET rta_daily_report/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "type": "b2t"
        }}
     ],
     "must_not": [
       {
         "range": {
           "revenue": {
             "lte": 0
           }
         }
       }
     ]
    }
  },
  "sort": [
    {
      "yyyymmdd": {
        "order": "desc"
      }
    }
  ],
  "_source": ["yyyymmdd", "type", "cid", "click", "revenue"],
  "size": 10       
}

聚合查询
下例是类似于sql中的聚合查询,查询每天不同类型对应的intall总量

GET /rta_daily_report/_search
{
  "size": 0,
  "aggs": {
    "sum_install": {
      "date_histogram": {
        "field": "yyyymmdd",
        "interval": "day"
      },
      "aggs": {
        "types": {
          "terms": {
            "field": "type.keyword",
            "size": 10
          },
          "aggs": {
            "install": {
              "sum": {
                "field": "install"
              }
            }
          }
        }
      }
    }
  }
}

output

"aggregations": {
    "sum_install": {
      "buckets": [
        {
          "key_as_string": "2017-07-31T00:00:00.000Z",
          "key": 1501459200000,
          "doc_count": 659553,
          "types": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "rba",
                "doc_count": 321811,
                "install": {
                  "value": 73835
                }
              },
              {
                "key": "m_normal",
                "doc_count": 321711,
                "install": {
                  "value": 18964
                }
              },

script查询
下例通过document中的click,install字段,计算出文档中不存在的数据。

GET /rta_daily_report/campaign/_search?pretty
{
    "query" : {
      "bool": {
        "must": [
          {
            "range": {
              "click": {
                "gt": 0
              }
            }
          },
          {
            "range": {
              "install": {
                "gt": 0
              }
            }
          }
        ]
    }},
    "size": 100, 
    "script_fields": {
      "cti": {
        "script": {
          "lang": "painless",
          "inline": "1.0 * doc['install'].value / doc['click'].value"
        }
      }
    }
}

output

"hits": {
    "total": 23036,
    "max_score": 2,
    "hits": [
      {
        "_index": "rta_daily_report",
        "_type": "campaign",
        "_id": "160647918_rta_20170801",
        "_score": 2,
        "fields": {
          "cti": [
            0.0005970149253731343
          ]
        }
      },
      {
        "_index": "rta_daily_report",
        "_type": "campaign",
        "_id": "162293741_rta_20170801",
        "_score": 2,
        "fields": {
          "cti": [
            0.00007796055196070789
          ]
        }
      },

查询一段时间内的聚合数据

GET rta_daily_report/campaign/_search
{
  "size": 0,
  "aggs": {
    "snaptime": {
      "date_range": {
        "field": "@timestamp",
        "ranges": [
          {
            "from": "now-30d/d",
            "to": "now"
          }
        ]
      },
      "aggs": {
        "sum_revenue": {
          "sum": {
            "field": "revenue"
          }
        }
      }
    }
  }
}

output:

"aggregations": {
    "snaptime": {
      "buckets": [
        {
          "key": "2017-07-17T00:00:00.000Z-2017-08-16T03:30:16.995Z",
          "from": 1500249600000,
          "from_as_string": "2017-07-17T00:00:00.000Z",
          "to": 1502854216995,
          "to_as_string": "2017-08-16T03:30:16.995Z",
          "doc_count": 18685619,
          "sum_revenue": {
            "value": 6631665219
          }
        }
      ]
    }
  }

查询某段时间内聚合数据,并且script计算额外字段

GET rta_daily_report/campaign/_search
{

  "size": 0,
  "aggs" : {
    "cvr_per_month" : {
      "date_range" : {
        "field": "@timestamp",
        "ranges": [
          {
            "from": "now-30d/d",
            "to": "now"
          }
        ]
      },
      "aggs": {
        "sum_click": {
          "sum": {
            "field": "click"
          }
        },
        "sum_install": {
          "sum": {
            "field": "install"
          }
        },
        "cvr": {
          "bucket_script": {
            "buckets_path": {
              "install": "sum_install",
              "click": "sum_click"
            },
           "script": "1.0 * params.install / params.click"
          }
        }
      }
    }
  }
}

output: 
"aggregations": {
    "cvr_per_month": {
      "buckets": [
        {
          "key": "2017-07-17T00:00:00.000Z-2017-08-16T03:37:22.732Z",
          "from": 1500249600000,
          "from_as_string": "2017-07-17T00:00:00.000Z",
          "to": 1502854642732,
          "to_as_string": "2017-08-16T03:37:22.732Z",
          "doc_count": 18685619,
          "sum_click": {
            "value": 15067388421
          },
          "sum_install": {
            "value": 7602055
          },
          "cvr": {
            "value": 0.0005045370032012133
          }
        }
      ]
    }
  }

参考链接:
日期格式
查询语法1
查询语法2

kibana

logstash

TODO:

常见问题

上一篇下一篇

猜你喜欢

热点阅读