【2019-05-29】Flume

2019-05-29  本文已影响0人  BigBigFlower

设计flume的宗旨是向Hadoop批量导入基于事件的流量海报。Flume由一组以分布式拓扑结构相互连接的代理构成。系统边缘的代理负责采集数据,并把数据转发给负责汇总的代理,然后再将这些数据存储到其最终的目的地。代理通过配置来运行一组特定的source(数据来源)和sink(数据目标)。Flume代理是由持续运行的source、sink、channel(用于连接source和sink)构成的Java进程。

事务和可靠性
flume使用两个独立的事务分别负责从source到channel和从channel到sink的事件传递。

//使用spooling directory source和logger sink的Flume配置
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /tmp/spooldir

agent1.sinks.sink1.type = logger

agent1.channels.channel1.type = file

HDFS sink

//使用spooling directory source和HDFS sink的Flume配置
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /tmp/spooldir

agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /tmp/flume
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream

agent1.channels.channel1.type = file

扇出指从一个source向多个channel,亦即向多个sink传递事件。

使用spooling directory source且扇出到HDFS sink和logger sink的Flume Agent
//使用spooling directory source且扇出到HDFS sink和logger sink的Flume配置
agent1.sources = source1
agent1.sinks = sink1a sink1b
agent1.channels = channel1a channel1b

agent1.sources.source1.channels = channel1a channel1b
agent1.sources.source1.selector.type = replicating
agent1.sources.source1.selector.optional = channel1b
agent1.sinks.sink1a.channel = channel1a
agent1.sinks.sink1b.channel = channel1b

agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /tmp/spooldir

agent1.sinks.sink1a.type = hdfs
agent1.sinks.sink1a.hdfs.path = /tmp/flume
agent1.sinks.sink1a.hdfs.filePrefix = events
agent1.sinks.sink1a.hdfs.fileSuffix = .log
agent1.sinks.sink1a.hdfs.fileType = DataStream

agent1.sinks.sink1b.type = logger

agent1.channels.channel1a.type = file
agent1.channels.channel1b.type = memory

通过代理层分发


通过第二层代理汇聚来自第一层的Flume事件

第一层代理负责采集来自原始source,并将他们发放到第二层,第二层代理的数量比第一层少,这些代理先汇总来自第一层代理的事件,再把这些事件写入HDFS。

//使用spooling directory source和HDFS sink 的两层Flume代理的配置
# First tier agent

agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1

agent1.sources.source1.type = spooldir
agent1.sources.source1.spoolDir = /tmp/spooldir

agent1.sinks.sink1.type = avro
agent1.sinks.sink1.hostname = localhost
agent1.sinks.sink1.port = 10000

agent1.channels.channel1.type = file
agent1.channels.channel1.checkpointDir=/tmp/agent1/file-channel/checkpoint
agent1.channels.channel1.dataDirs=/tmp/agent1/file-channel/data

# Second tier agent

agent2.sources = source2
agent2.sinks = sink2
agent2.channels = channel2

agent2.sources.source2.channels = channel2
agent2.sinks.sink2.channel = channel2

agent2.sources.source2.type = avro
agent2.sources.source2.bind = localhost
agent2.sources.source2.port = 10000

agent2.sinks.sink2.type = hdfs
agent2.sinks.sink2.hdfs.path = /tmp/flume
agent2.sinks.sink2.hdfs.filePrefix = events
agent2.sinks.sink2.hdfs.fileSuffix = .log
agent2.sinks.sink2.hdfs.fileType = DataStream

agent2.channels.channel2.type = file
agent2.channels.channel2.checkpointDir=/tmp/agent2/file-channel/checkpoint
agent2.channels.channel2.dataDirs=/tmp/agent2/file-channel/data
由Avro sink-source对连接的一个两层Flume代理

sink组
sink组允许将多个sink当作一个sink来处理,以实现故障转移或负载均衡。若某个第二层代理不能用,事件将传递给另一个第二层代理,从而使这些事件不中断地到达HDFS。


为了使负载均衡或故障而使用多个sink
在两个代理之间实现负载均衡
上一篇下一篇

猜你喜欢

热点阅读