ElasticSearch | Ingest Pipeline

2020-05-28  本文已影响0人  乌鲁木齐001号程序员

Ingest Node

Ingest Node vs Logstash

  Logstash Ingest Node
数据输入与输出 支持从不同的数据源读取,并写入不同的数据源 支持从 ES REST API 获取数据并且写入 ElasticSearch
数据缓冲 实现了简单的数据队列,支持重写 不支持缓冲
数据处理 支持大量的插件,也支持定制开发 内置的插件,可以开发 Plugin 进行扩展(Plugin 更新需要重启)
配置和使用 增加了一定的架构复杂度 无需额外部署

Pipeline & Processor

Pipeline.png

在 Ingest Node 中可以定义 Pipeline

Pipeline
Processor

ElasticSearch | 内置 Processor

Pipeline | 举个栗子

准备数据
PUT tech_blogs/_doc/1
{
  "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
}
_simulate API | 将字段的值用 "," 分割
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "title": "Introducing big data......",
        "tags": "hadoop,elasticsearch,spark",
        "content": "You konw, for big data"
      }
    },
    {
      "_index": "index",
      "_id": "idxx",
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}
_simulate API | 在 Pipeline 中再添加一个 Processor
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "to split blog tags",
    "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        "set":{
          "field": "views",
          "value": 0
        }
      }
    ]
  },

  "docs": [
    {
      "_index":"index",
      "_id":"id",
      "_source":{
        "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
      }
    },
    {
      "_index":"index",
      "_id":"idxx",
      "_source":{
        "title":"Introducing cloud computering",
  "tags":"openstack,k8s",
  "content":"You konw, for cloud"
      }
    }
  ]
}
在 ElasticSearch 中添加一个 Pipeline
PUT _ingest/pipeline/blog_pipeline
{
  "description": "a blog pipeline",
  "processors": [
      {
        "split": {
          "field": "tags",
          "separator": ","
        }
      },
      {
        "set":{
          "field": "views",
          "value": 0
        }
      }
    ]
}
查看 Pipeline
GET _ingest/pipeline/blog_pipeline
测试 Pipeline
POST _ingest/pipeline/blog_pipeline/_simulate
{
  "docs": [
    {
      "_source": {
        "title": "Introducing cloud computering",
        "tags": "openstack,k8s",
        "content": "You konw, for cloud"
      }
    }
  ]
}
使用 Pipeline 和不使用 Pipeline 向索引中添加数据
PUT tech_blogs/_doc/1
{
  "title":"Introducing big data......",
  "tags":"hadoop,elasticsearch,spark",
  "content":"You konw, for big data"
}

PUT tech_blogs/_doc/2?pipeline=blog_pipeline
{
  "title": "Introducing cloud computering",
  "tags": "openstack,k8s",
  "content": "You konw, for cloud"
}

POST tech_blogs/_search
{}
使用 blog_pipeline 重建 tech_blogs 中的所有文档
POST tech_blogs/_update_by_query?pipeline=blog_pipeline
{
}
增加 update_by_query 的条件
POST tech_blogs/_update_by_query?pipeline=blog_pipeline
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "views"
                }
            }
        }
    }
}
上一篇 下一篇

猜你喜欢

热点阅读