爬虫日志收集(flume+kafka+elk)
(一)flume1.6
1.1 flume配置(将日志上传到HDFS离线分析和kafka实时分析)
a1.sources = r1
a1.sinks = k2 k1
a1.channels = c2 c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command=tail -n +0 -f /usr/lang/log.log
a1.sources.r1.channels = c1
a1.sources.r1.channels = c2
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs://lang:8020/user/flume
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k2.channel=c2
a1.sinks.k2.type=org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k2.topic=lang
a1.sinks.k2.brokerList=node1:9092
a1.sinks.k2.requiredAcks=1
a1.sinks.k2.batchSize=20
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
1.2 flume启动
bin/flume-ng agent -c conf -f conf/flume-conf -n a1 -Dflume.root.logger=DEBUG,console
(二)kafka 0.11集群
2.1重要配置文件
server.properties:
broker.id=0 (根据实际主机,分配0,1,2)
listeners=PLAINTEXT://:9092
zookeeper.connect=192.168.205.11:2181,192.168.205.12:2181,192.168.205.13:2181
producer.properties
bootstrap.servers=192.168.205.11:9092,192.168.205.12:9092,192.168.205.13:9092
consumer.properties
zookeeper.connect=192.168.205.11:2181,192.168.205.12:2181,192.168.205.13:2181
2.2同步配置文件
2.3相关命令
先启动zookeeper
启动kafka bin/kafka-server-start.sh config/server.properties &
停止kafka bin/kafka-server-stop.sh
创建topic bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic lang
展示topic bin/kafka-topics.sh --list --zookeeper localhost:2181
描述topic bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic lang
生产者: bin/kafka-console-producer.sh --broker-list node1:9092 --topic lang
消费者: bin/kafka-console-consumer.sh -bootstrap-server localhost:9092 --topic lang --from-beginning
删除topic: bin/kafka-topics.sh --delete --zookeeper 130.51.23.95:2181 --topic topicname
(三)logstash5.5.1
3.1配置(文件输入,es输出)
input {
file {
path => ["/usr/lang/log.log"]
start_position => "beginning"
}
}
filter {
date {
match => [ "timestamp" , "YYYY-MM-dd HH:mm:ss" ]
}
}
output {
elasticsearch {
hosts => ["192.168.205.14:9200"]
}
stdout {
codec => rubydebug
}
}
3.2配置(kafka输入,es输出)
input {
kafka {
#workers =>2
bootstrap_servers => "node1:9092,node2:9092,node3:9092" #zookeeper地址
topics => "lang" #kafka中topic名称,记得创建该topic
#group_id => "logstash" #默认为“logstash”
#consumer_threads =>2 #消费的线程数
#reset_beginning => false
#reset_beginning=>true
#decorate_events => true #在输出消息的时候回输出自身的信息,包括:消费消息的大小、topic来源以及consumer的group信息。
#type => "nginx-access-log"
}
}
filter {
date {
match => [ "timestamp" , "YYYY-MM-dd HH:mm:ss" ]
}
}
output {
elasticsearch {
hosts => ["192.168.205.14:9200"]
#index => "kafakindex-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
(四)elasticsearch
4.1内存配置 config/jvm.properties
4.2配置文件 config/elsticsearch
cluster.name: my-application
node.name: node-1(集群中名称不一样)
network.host: 192.168.205.14
http.port: 9200
bootstrap.system_call_filter: false
http.cors.enabled: true
http.cors.allow-origin: "*"
4.3注意事项:Java内存参数,配置文件中空格问题
4.4elasticsearch-head(索引UI管理界面)
(五)kibana
没啥,直接启动
有问题直接联系我 QQ:1146941596