大数据开发环境搭建之Flink安装部署
2020-10-29 本文已影响0人
羋学僧
![](https://img.haomeiwen.com/i14270006/df98a2832d41c0b9.png)
一、Standalone模式安装
1、下载Flink
官网
官网提供的压缩包下载地址
flink-1.10.1-bin-scala_2.11.gz
2、解压Flink
在bigdata03服务器
cd /home/bigdata/soft/
tar -zxvf flink-1.10.1-bin-scala_2.11.gz
mv flink-1.10.1/ /home/bigdata/apps/
3、修改环境变量
命令:
vim ~/.bashrc
文件末尾追加两行内容:
export FLINK_HOME=/home/bigdata/apps/flink-1.10.1/
export PATH=$PATH:$FLINK_HOME/bin
保存退出之后,使用命令source使之生效
source ~/.bashrc
4、Local模式安装(单机flink)
启动服务
cd /home/bigdata/apps/flink-1.10.1/
./bin/start-cluster.sh
停止服务
./bin/stop-cluster.sh
5、Web页面浏览
http://bigdata03:8081/
![](https://img.haomeiwen.com/i14270006/30e1932ce173d1da.png)
6、 Standalone模式安装
集群规划
![](https://img.haomeiwen.com/i14270006/0f19b165cf18d069.png)
7、配置集群
修改conf/flink-conf.yaml
cd /home/bigdata/apps/flink-1.10.1/conf/
vim flink-conf.yaml
修改内容
jobmanager.rpc.address: bigdata03
修改conf/slaves
vim slaves
修改内容
bigdata03
bigdata05
8、复制bigdata03中的flink-1.10.1文件夹到bigdata05
scp -r /home/bigdata/apps/flink-1.10.1/ bigdata@bigdata05:~/apps
9、在bigdata03(JobMananger)节点启动
start-cluster.sh
10、Web页面浏览
http://bigdata03:8081/
11、StandAlone模式需要考虑的参数
jobmanager.heap.mb:jobmanager节点可用的内存大小
taskmanager.heap.mb:taskmanager节点可用的内存大小
taskmanager.numberOfTaskSlots:每台机器可用的cpu数量
parallelism.default:默认情况下任务的并行度
taskmanager.tmp.dirs:taskmanager的临时数据存储目录
二、on Yarn模式
![](https://img.haomeiwen.com/i14270006/7ce378575ce19ad2.png)
官网提供的example
1.修改配置文件
flink-conf.yaml
jobmanager.rpc.address: bigdata03
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 2
parallelism.default: 1
high-availability: zookeeper
high-availability.storageDir: hdfs://bigdata02:9000/flink/ha/
high-availability.zookeeper.quorum: bigdata02:2181,bigdata03:2181,bigdata04:2181
https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
state.backend: filesystem
state.checkpoints.dir: hdfs://bigdata02:9000/flink-checkpoints
state.savepoints.dir: hdfs://bigdata02:9000/flink-checkpoints
jobmanager.execution.failover-strategy: region
io.tmp.dirs: /home/bigdata/data/flink/tmp
env.log.dir: /home/bigdata/data/flink/log
masters
bigdata03:8081
bigdata05:8081
slaves
bigdata03
bigdata04
bigdata05
zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial synchronization phase can take
initLimit=10
# The number of ticks that can pass between sending a request and getting an acknowledgement
syncLimit=5
# The directory where the snapshot is stored.
# dataDir=/tmp/zookeeper
# The port at which the clients will connect
clientPort=2181
# ZooKeeper quorum peers
server.0=bigdata02:2888:3888
server.1=bigdata03:2888:3888
server.2=bigdata04:2888:3888
2.复制到其他机器
scp -r /home/bigdata/apps/flink-1.10.1/ bigdata@bigdata04:~/apps
scp -r /home/bigdata/apps/flink-1.10.1/ bigdata@bigdata05:~/apps
3.配置jar包
1. while creating FileSystem when initializing the state of the BucketingSink.
2. Could not find a file system implementation for scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded.
3.org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is not in the classpath/dependencies.
1.10.1如果要操作hdfs的话,必须要在flink安装目录的 lib 下加上额外的jar包
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/地址
![](https://img.haomeiwen.com/i14270006/a92b491384fb077f.png)
我的hadoop是2.7.7
![](https://img.haomeiwen.com/i14270006/517fa6c4d5b92e16.png)
4.启动
start-cluster.sh
![](https://img.haomeiwen.com/i14270006/e307331406628569.png)
5.提交程序
cd /home/bigdata/apps/flink-1.10.1
flink run -m yarn-cluster ./examples/batch/WordCount.jar
![](https://img.haomeiwen.com/i14270006/4da72bf2b28ab8c3.png)