ELK部署小结
采用ELK+Kafka+filebeat方式部署,非规模系统,所以仅仅进行了单点部署,没有搭建ES/KAFKA集群;filebeat和kafka部署于公网,ELK节点于局域网对kafka数据进行拉取
环境说明
filebeat
- 收集服务日志(json/plant)
- Ubuntu14.04
- filebeat-6.2.3
kafka
- kafka_2.12-2.1.0
- centos7.6.1810
- kafka-manager-1.3.3.22
ELK
- ubuntu 16.04.1
- kibana 5.6.12
- elasticsearch 5.6.12
- logstash 5.6.12
架构设计
公网
- filebeat采集节点传输数据到KAFKA节点
- 位于公网的kafka接受生产者filebeat发来的数据
局域网
- 内网搭建ELK节点,通过logstash从公网kafka节点消费数据
Kafka
kafka是一个高吞吐量的中间消息队列,常用来做大数据相关的架构。同时,它的高吞吐量也带来对带宽的高需求,包括稳定性和传输速度(这里埋下了伏笔,以后会提到)
说一下本次部署采用kafka的思路:
- 蜜罐数据展示,最方便的是kibana
- elk节点一方面因为吃配置,一方面为了数据安全,一定要放在本地
- 公网数据回传本地,而且公网数据节点非常分散
- 所以先收集数据到一个中心节点(kafka),而后发挥kafka的消息队列优势,一起回传到本地ELK进行展示分析
安装
kafka在新版本以后,内部已经集成了zookeeper模块,故而直接下载kafka源码包就可以了
wget https://www.apache.org/dyn/closer.cgi?path=/kafka/2.1.0/kafka_2.12-2.1.0.tgz
tar xvf kafka_2.12-2.1.0.tgz
查看kafka目录
[root@iZj6c5bdyyg7se9hbjaakuZ kafka2.12-2.1.0]# ls
bin config libs LICENSE logs NOTICE site-docs
[root@iZj6c5bdyyg7se9hbjaakuZ kafka2.12-2.1.0]# ls ./bin/
connect-distributed.sh kafka-console-producer.sh kafka-log-dirs.sh kafka-run-class.sh kafka-verifiable-producer.sh zookeeper-shell.sh
connect-standalone.sh kafka-consumer-groups.sh kafka-mirror-maker.sh kafka-server-start.sh trogdor.sh
kafka-acls.sh kafka-consumer-perf-test.sh kafka-preferred-replica-election.sh kafka-server-stop.sh windows
kafka-broker-api-versions.sh kafka-delegation-tokens.sh kafka-producer-perf-test.sh kafka-streams-application-reset.sh zookeeper-security-migration.sh
kafka-configs.sh kafka-delete-records.sh kafka-reassign-partitions.sh kafka-topics.sh zookeeper-server-start.sh
kafka-console-consumer.sh kafka-dump-log.sh kafka-replica-verification.sh kafka-verifiable-consumer.sh zookeeper-server-stop.sh
[root@iZj6c5bdyyg7se9hbjaakuZ kafka2.12-2.1.0]# ls ./config/
connect-console-sink.properties connect-file-sink.properties connect-standalone.properties producer.properties trogdor.conf
connect-console-source.properties connect-file-source.properties consumer.properties server.properties zookeeper.properties
connect-distributed.properties connect-log4j.properties log4j.properties tools-log4j.properties
[root@iZj6c5bdyyg7se9hbjaakuZ kafka2.12-2.1.0]#
因为zookeeper和Kafka配置文件里写的都是主机名,因此配置之前,先修改hosts主机文件
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
0.0.0.0 iZj6c5bdyyg7se9hbjaakuZ
zookeeper配置文件
# zookeeper数据目录,主要存放三种数据:broker、topic、productions
dataDir=/opt/datas/zookeeper
# 默认端口
clientPort=2181
maxClientCnxns=0
kafka配置文件
# 随意正整数,如果是集群,注意id唯一
broker.id=1
# kafka链接地址,注意使用主机名
listeners=PLAINTEXT://iZj6c5bdyyg7se9hbjaakuZ:9092
advertised.listeners=PLAINTEXT://iZj6c5bdyyg7se9hbjaakuZ:9092
# 性能参数
num.network.threads=5
num.io.threads=10
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
# 数据目录,kafka数据以文件形式存储
# 一个分片一个文件夹,多个分片多个文件夹;此外,zookeeper数据中会记录kafka消息消费的offset
log.dirs=/opt/datas/kafkalogs
# 性能参数
num.partitions=3
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
# 指定zookeeper连接,因为是同一台机器,故配置相同的主机名,默认2181端口
zookeeper.connect=localhost:2181
advertised.host.name=iZj6c5bdyyg7se9hbjaakuZ
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
注意:kafka和zookeeper有先后启动顺序,先zookeeper后kafka;关闭服务时先kafka后zookeeper,不然数据会出问题
启动服务
# 启动zookeeper
[root@iZj6c5bdyyg7se9hbjaakuZ kafka2.12-2.1.0]# cd /opt/kafka2.12-2.1.0/bin/
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# ./zookeeper-server-start.sh -daemon ../config/zookeeper.properties
# 启动kafka
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# ./kafka-server-start.sh -daemon ../config/server.properties
# 查看进程,第一个是zookeeper,第二个是kafka
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# jps
653033 QuorumPeerMain
661787 Kafka
661826 Jps
创建消息主题
两个主题,分别对应栾世杰的cowrie
和fanuc
数据
在这里创建了两个主题,都是一个副本,8个分区。副本数不能大于broker数,因为是单机部署,故副本数最大为1;多个分区是为了并行消费,水平扩展,这里设置为8个分区。分区太多也会导致数据同步过程中出现延迟,所以并不是越大越好。如果是大规模部署,多副本多分区情况,可以考虑配置producer的ACK异步,可以节省大量时间。
producer 的deliver guarantee可以通过request.required.acks参数的设置来进行调整:
- 0,相当于异步发送,消息发送完毕即offset增加,继续生产;相当于At most once
- 1,leader收到leader replica 对一个消息的接受ack才增加offset,然后继续生产
- -1,leader收到所有replica 对一个消息的接受ack才增加offset,然后继续生产
# 创建两个主题 //指定zookeeper集群;指定副本数;指定分区数;指定主题名称
./bin/kafka-topics.sh --create --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --replication-factor 1 --partitions 8 --topic lsj_fanuc
./bin/kafka-topics.sh --create --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --replication-factor 1 --partitions 8 --topic lsj_cowrie
# 查看主题
./bin/kafka-topics.sh --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --list
查看消费数据,此时没有数据,如果有数据会快速刷新到控制台,Ctrl + C
退出
# 消费者脚本,查看消费者数据 //从最开始查看(消费)
./kafka-console-consumer.sh --bootstrap-server iZj6c5bdyyg7se9hbjaakuZ:9092 --topic lsj_fanuc --from-beginning
./kafka-console-consumer.sh --bootstrap-server iZj6c5bdyyg7se9hbjaakuZ:9092 --topic lsj_cowrie --from-beginning
查看端口:2181/9092
[root@iZj6c5bdyyg7se9hbjaakuZ ~]# netstat -antup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:39944 0.0.0.0:* LISTEN 661787/java
tcp 0 0 0.0.0.0:40461 0.0.0.0:* LISTEN 653033/java
tcp 0 0 0.0.0.0:2048 0.0.0.0:* LISTEN 3167/sshd
tcp 0 0 0.0.0.0:9092 0.0.0.0:* LISTEN 661787/java
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 653033/java
tcp 0 0 127.0.0.1:2181 127.0.0.1:54480 ESTABLISHED 653033/java
tcp 0 0 172.31.62.108:36518 100.100.30.25:80 ESTABLISHED 2920/AliYunDun
tcp 0 0 127.0.0.1:54480 127.0.0.1:2181 ESTABLISHED 661787/java
tcp 0 120 172.31.62.108:2048 111.193.52.213:34419 ESTABLISHED 662776/sshd: root@p
tcp 0 0 172.31.62.108:2048 114.249.23.35:32569 ESTABLISHED 661344/sshd: root@p
tcp 1 0 127.0.0.1:46822 127.0.0.1:9092 CLOSE_WAIT 661787/java
udp 0 0 0.0.0.0:68 0.0.0.0:* 2723/dhclient
udp 0 0 127.0.0.1:323 0.0.0.0:* 1801/chronyd
udp6 0 0 ::1:323 :::* 1801/chronyd
[root@iZj6c5bdyyg7se9hbjaakuZ ~]#
至此,kafka端(阿里云)已配置完成
其他相关命令
调整分区数
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# ./kafka-topics.sh --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --alter --topic lsj_fanuc --partitions 8
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
Adding partitions succeeded!
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# ./kafka-topics.sh --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --alter --topic lsj_cowrie --partitions 8
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected
Adding partitions succeeded!
删除主题(删除之后数据还在,只是标记为不可用;若想彻底删除,进入到配置文件的data目录中删除)
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# ./kafka-topics.sh --delete --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --topic mysql_topic
查看kafka特定topic的详情
[root@iZj6c5bdyyg7se9hbjaakuZ bin]# ./kafka-topics.sh --zookeeper iZj6c5bdyyg7se9hbjaakuZ:2181 --topic lsj_fanuc --describe
kafka相关报错处理
- Error while fetching metadata with correlation id : {LEADER_NOT_AVAILABLE}
- KeeperErrorCode = NoNode for [节点路径]
filebeat
filebeat是一个轻量化的消息采集工具,相比于logstash和其他工具,不需要依赖Java环境,采用C语言编写,可通过Ansible推送脚本自动安装到各个蜜罐节点。
下载
同属于elastic公司产品,可在官网下载
wget wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.3-linux-x86_64.tar.gz
tar xvf /filebeat-6.2.3-linux-x86_64.tar.gz
配置运行
vim filebeat.yml
filebeat.prospectors:
- type: log //通常情况typt字段均为log
enabled: true //启用
paths:
- /home/cowrie/cowrie/log/cowrie*
fields: //指定流入主题
log_topics: lsj_cowrie
- type: log
enabled: true
paths:
- /root/fanucdocker_session/network_coding/log/mylog_*
fields:
log_topics: lsj_fanuc
output.kafka:
enabled: true
hosts: ["47.244.139.92:9092"]
topic: '%{[fields][log_topics]}'
启动
# 前台运行
/root/filebeat-6.2.3/filebeat -e -c /root/filebeat-6.2.3/filebeat.yml
# 后台运行
nohup /root/filebeat-6.2.3/filebeat -e -c /root/filebeat-6.2.3/filebeat.yml >/dev/null 2>&1 &
ELK
elk均采用5.6.12
版本
elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.12.tar.gz
tar xvf elasticsearch-5.6.12.tar.gz
配置文件vim elasticsearch.yml
# 集群用户名
cluster.name: lsj_elk
path.logs: /var/log/elk/elastic
network.host: 10.10.2.109
http.port: 9200
# 安装ES-Head插件需要配置的参数
http.cors.enabled: true
http.cors.allow-origin: "*"
启动
# 前台
./bin/elasticsearch
# 后台
./bin/elasticsearch -d
kibana
apt-get安装
root@elk:/opt/elk# apt list | grep kibana
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
kibana/now 5.6.12 amd64 [installed,local]
root@elk:/opt/elk#
配置文件vim /etc/kibana/kibana.yml
server.port: 5601
server.host: "10.10.2.109"
server.name: "elk_lsj"
elasticsearch.url: "http://10.10.2.109:9200"
启动
systemctl start kibana
systemctl enable kibana
查看端口
root@elk:/opt/elk# netstat -antup | grep 5601
tcp 0 0 10.10.2.109:5601 0.0.0.0:* LISTEN 17774/node
root@elk:/opt/elk#
logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-5.6.12.tar.gz
tar xvf logstash-5.6.12.tar.gz
lgstash配置文件logstash.yml
不需要配置,只需要根据输出要求配置input_output.conf即可:
下面是采用kafka传入的配置文件vim logstash-5.6.12/config/kafka_logstash.conf
### Input段,负责数据流入控制
# 这里虽然只有一个kafka,但cowrie和fanuc日志格式不同,故分开来处理,并一定要用client_id唯一字段来区分
input {
kafka {
bootstrap_servers => "47.244.139.92:9092" //kafka服务,严格说是kafka集群的leader节点,由于是单机kafka,故只有这个IP
client_id => "test_cowrie" //多输入流情况下配置的唯一ID
group_id => "test" //输入流分组
topics => ["lsj_cowrie"] //数据来源主题,可以配置多个,逗号分隔
type => "Cowrie" //logstash.input中的`type`字段,便于下面日志的分开解析
consumer_threads => 8 //logstash并行消费数据,通常情况下对应kafka集群的分区数量,这里设置为8个分区
}
kafka {
bootstrap_servers => "47.244.139.92:9092"
client_id => "test_fanuc"
group_id => "test"
topics => ["lsj_fanuc"]
type => "Fanuc"
consumer_threads => 8
}
}
### Filter段,负责数据格式解析
filter {
if [type] == "Cowrie" {
# cowrie为json日志,直接使用json解析模块即可自动解析
json {
source => "message"
remove_field => ["message"]
# 默认情况,解析出来的json会添加message字段,将整条数据加进去,这个没有必要,因而要去掉
}
# json解析完成后,对src_ip字段进行定位,采用Geoip离线数据库
if [src_ip] {
geoip {
source => "src_ip"
target => "geoip"
database => "/opt/elk/GeoLite2-City/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
}
# Fanuc日志为plant格式,这里要用到grok解析模块
if [type] == "Fanuc" {
# 根据日志多种组合情况,配置多条匹配,使用logstash.filter正则匹配
grok {
match => { "message" => "Connection from %{IP:src_ip}:%{BASE10NUM:src_port} closed." }
add_tag => [ "fanuc_connection_closed" ]
tag_on_failure => []
}
grok {
match => { "message" => "Accept new connection from %{IP:src_ip}:%{BASE10NUM:src_port}" }
add_tag => [ "fanuc_new_connection" ]
tag_on_failure => []
}
grok {
match => { "message" => "function:%{URIPATHPARAM:funcCode}" }
add_tag => [ "fanuc_FunctionCode" ]
tag_on_failure => []
}
grok {
match => { "message" => "request: %{BASE16NUM:request}" }
add_tag => [ "fanuc_request" ]
tag_on_failure => []
}
grok {
match => { "message" => "results: %{BASE16NUM:results}" }
add_tag => [ "fanuc_results" ]
tag_on_failure => []
}
# 定位src_ip字段
if [src_ip] {
geoip {
source => "src_ip"
target => "geoip"
database => "/opt/elk/GeoLite2-City/GeoLite2-City.mmdb"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
}
}
### Output段,负责数据流出控制
output {
elasticsearch{
hosts => ["10.10.2.109:9200"]
index => "logstash_lsj_elk"
timeout => 300
}
}
关于logstash日志解析
- input直接设置format => json
- filter使用codec => json
- 使用filter json插件
启动logstash
root@elk:/opt/elk/logstash-5.6.12# pwd
/opt/elk/logstash-5.6.12
root@elk:/opt/elk/logstash-5.6.12# ./bin/logstash -f ./config/kafka_logstash.conf
# 后台运行(也可以使用supervisor)
root@elk:/opt/elk/logstash-5.6.12# nohup ./bin/logstash -f ./config/kafka_logstash.conf >/dev/null 2>&1 &
报错&&思考
前前后后拖了两周,,周三终于找到了原因---实验室带宽不够,kafka带宽胃口太大了,,
所以就导致:
- logstash从kafka取数据消费
- 由于带宽问题,logstash无论如何都不能在规定时间内消费完毕
- 这时候kafka认为logstash已经离线了,就主动断开连接,,,
- logstash再次向kafka请求数据,周而复始,没有数据流入,,
而后我回到寝室,使用小区带宽,发现阿里云的带宽仅仅有1Mbp,也就是200kb左右的速率,那就凉凉了
(2019/03/22)考虑到阿里云购买了一年,不能浪费,目前有两种办法可行:
- 取消kafka,转而使用redis做消息队列
- 不使用消息中间件,直接将本地ELK的logstash端口代理到阿里云上的某一端口
Redis方案
Todo
logstash端口转发方案
Todo