ElasticSearch | Logstash | 入门 |

2020-06-02  本文已影响0人  乌鲁木齐001号程序员

Logstash

数据搜集处理引擎,支持 200 多个插件。


Logstash.png

Logstash | 架构简介

Codec(Code / Decode):将原始数据 decode 成 Event,再将 Event encode 成目标数据。


Logstash 架构.png

Logstash | 相关概念

Pipeline
Event

Input Plugins

一个 Pipeline 可以有多个 input 插件:

Input Plugin | File

Output Plugin

将 Event 发送到特定的目的地,是 Pipeline 的最后一个阶段:

Codec Plugin

将原始数据 decode 成 Event;将 Event encode 成目标数据:

Filter Plugin

Filter Plugin 可以对 Logstash Event 进行各种处理,例如解析字段、删除字段、类型转换:

Filter Plugin | Mutate

对字段做各种操作:

Queue

Queue.png
In Memory Queue

进程 Crash,机器宕机都会引起数据的丢失。

Persistent Queue

进程 Crash,机器宕机也不会丢失数据;数据保证会被消费,可以替代 Kafka 等消息队列缓冲区的作用。


Codec Plugins - Single Line | 举几个栗子

Codec Plugin - Multiline | 举个栗子

运行 multiline-exception.conf

sudo bin/logstash -f multiline-exception.conf

multiline-exception.conf 中的内容
input {
  stdin {
    codec => multiline {
      pattern => "^\s"
      what => "previous"
    }
  }
}


filter {}

output {
  stdout { codec => rubydebug }
}
输入一段 Java 的堆栈异常作为多行数据
Exception in thread "main" java.lang.NullPointerException
        at com.example.myproject.Book.getTitle(Book.java:16)
        at com.example.myproject.Author.getBookTitles(Author.java:25)
        at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
再输入以字母开头的文本

hello word

输出上一段匹配到的 Java 的堆栈信息;


拿个 logstash.conf 出来分析一下

input {
  file {
    path => "/home/lixinlei/data/movielens/ml-latest-small/movies.csv"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
6,Heat (1995),Action|Crime|Thriller
7,Sabrina (1995),Comedy|Romance
...
filter {
  csv {
    separator => ","
    columns => ["id","content","genre"]
  }

  mutate {
    split => { "genre" => "|" }
    remove_field => ["path", "host","@timestamp","message"]
  }

  mutate {
    split => ["content", "("]
    add_field => { "title" => "%{[content][0]}"}
    add_field => { "year" => "%{[content][1]}"}
  }

  mutate {
    convert => {
      "year" => "integer"
    }
    strip => ["title"]
    remove_field => ["path", "host","@timestamp","message","content"]
  }

}
output {
   elasticsearch {
     hosts => "http://localhost:9200"
     index => "movies"
     document_id => "%{id}"
   }
  stdout {}
}
上一篇下一篇

猜你喜欢

热点阅读