Flume:Agent应用--实时监控读取日志数据,存储hdfs
2018-08-15 本文已影响0人
chengruru
一、任务描述
1.收集hive 运行的log日志(source)
/opt/cloudera/hive/logs/hive.log
读取文件内容目录命令:tail -f
2.内存(channel)
3.存储在HDFS上(sink)
二、创建agent文件
创建一个agent配置文件:agent.conf
$ cd flume/conf
$ sudo cp flume-conf.properties.template agent.conf
$ sudo vim agent.conf
修改agent.conf文件内容如下所示:
# 定义一个agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /opt/cloudera/hive/logs/hive.log
a1.sources.r1.shell = /bin/sh -c
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://Master:9000/user/hadoop/flume/hive-log
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.batchSize = 10
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 通过channel将source与sink连接起来
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
三、执行命令
$ cd flume
$ ./bin/flume-ng agent \
> --conf conf \
> --name agent \
> --conf-file agent.conf
结果显示:
2018-08-15 05:13:20,988 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147.tmp
2018-08-15 05:13:21,002 (hdfs-k1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas(AbstractHDFSWriter.java:200)] Using getNumCurrentReplicas--HDFS-826
2018-08-15 05:13:21,002 (hdfs-k1-call-runner-7) [DEBUG - org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetDefaultReplication(AbstractHDFSWriter.java:228)] Using FileSystem.getDefaultReplication(Path) from HADOOP-8014
2018-08-15 05:13:21,006 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:618)] rolling: rollCount: 10, events: 10
2018-08-15 05:13:21,007 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:393)] Closing hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147.tmp
2018-08-15 05:13:21,010 (hdfs-k1-call-runner-2) [INFO - org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:655)] Renaming hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147.tmp to hdfs://Master:9000/user/hadoop/flume/hive-log/FlumeData.1534335148147
此时,我们通过进入hive命令行执行相关命令,产生日志,则flume则可进行收集。
$ cd hive/
$ ./bin/hive
hive (default)> show tables;
查看hdfs上,在agent.conf中配置的文件夹:/user/hadoop/flume/hive-log
$ cd hadoop/
$ ./bin/hdfs dfs -ls /user/hadoop/flume/hive-log
结果显示如下:
Found 5 items
-rw-r--r-- 1 hadoop supergroup 1158 2018-08-15 05:06 /user/hadoop/flume/hive-log/FlumeData.1534334788517
-rw-r--r-- 1 hadoop supergroup 1036 2018-08-15 05:06 /user/hadoop/flume/hive-log/FlumeData.1534334788518
-rw-r--r-- 1 hadoop supergroup 1036 2018-08-15 05:07 /user/hadoop/flume/hive-log/FlumeData.1534334788519
-rw-r--r-- 1 hadoop supergroup 1225 2018-08-15 05:07 /user/hadoop/flume/hive-log/FlumeData.1534334788520
-rw-r--r-- 1 hadoop supergroup 847 2018-08-15 05:07 /user/hadoop/flume/hive-log/FlumeData.1534334788521.tmp
查看收集文件的内容:
$ ./bin/hdfs dfs -cat /user/hadoop/flume/hive-log/FlumeData.1534334788517
结果显示如下(hive运行日志信息):
2018-08-15 05:03:31,525 INFO [main]: ql.Driver (Driver.java:compile(570)) - Semantic Analysis Completed
2018-08-15 05:03:31,526 INFO [main]: ql.Driver (Driver.java:getSchema(303)) - Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
2018-08-15 05:03:31,530 INFO [main]: ql.Driver (Driver.java:compile(690)) - Completed compiling command(queryId=hadoop_20180815050303_474cafaf-4338-4a93-b861-11e0d1297a42); Time taken: 0.01 seconds
2018-08-15 05:03:31,531 INFO [main]: ql.Driver (Driver.java:checkConcurrency(223)) - Concurrency mode is disabled, not creating a lock manager
2018-08-15 05:03:31,531 INFO [main]: ql.Driver (Driver.java:execute(1656)) - Executing command(queryId=hadoop_20180815050303_474cafaf-4338-4a93-b861-11e0d1297a42): show tables
2018-08-15 05:03:31,531 INFO [main]: ql.Driver (Driver.java:launchTask(2050)) - Starting task [Stage-0:DDL] in serial mode
2018-08-15 05:03:31,561 INFO [main]: ql.Driver (Driver.java:execute(1958)) - Completed executing command(queryId=hadoop_20180815050303_474cafaf-4338-4a93-b861-11e0d1297a42); Time taken: 0.03 seconds
至此结束!