Apache Flink

Flink之运行模式

2019-03-19  本文已影响0人  MrSocean

Apache Flink 初识

Apache Flink作为Apache的顶级项目,固然集众多优点于一身。Flink具有分布式MR一类平台的高效性,灵活性和扩展性。同时,Flink 还支持批量和局域流的数据分析,而且提供基于Java和Scala的API。总的来说,Flink是一个分布式,高性能,高可用,准确的,基于Java实现的通用大数据分析引擎。引用官网的一句话介绍Flink:基于数据流的有状态计算-Stateful Computations over Data Streams。

flink.png

Flink运行模式

1:Flink和Spark一样有三种部署模式,分别是Local,Standalone Cluster和Yarn Cluster。本文主要是介绍在Yarn Cluster模式下,Flink任务的执行和资源分配是如何的!
flink 运行方式.png
2:启动yarn-session时需要指定的参数:
 Usage:
   Required
     -n,--container <arg>   Number of YARN container to allocate (=Number of Task Managers)
   Optional
     -D <property=value>             use value for given property
     -d,--detached                   If present, runs the job in detached mode
     -h,--help                       Help for the Yarn session CLI.
     -id,--applicationId <arg>       Attach to running YARN session
     -j,--jar <arg>                  Path to Flink jar file
     -jm,--jobManagerMemory <arg>    Memory for JobManager Container with optional unit (default: MB)
     -m,--jobmanager <arg>           Address of the JobManager (master) to which to connect. Use this flag to connect to a different JobManager than the one specified in the configuration.
     -n,--container <arg>            Number of YARN container to allocate (=Number of Task Managers)
     -nl,--nodeLabel <arg>           Specify YARN node label for the YARN application
     -nm,--name <arg>                Set a custom name for the application on YARN
     -q,--query                      Display available YARN resources (memory, cores)
     -qu,--queue <arg>               Specify YARN queue.
     -s,--slots <arg>                Number of slots per TaskManager
     -sae,--shutdownOnAttachedExit   If the job is submitted in attached mode, perform a best-effort cluster shutdown when the CLI is terminated abruptly, e.g., in response to a user interrupt, such
                                 as typing Ctrl + C.
     -st,--streaming                 Start Flink in streaming mode
     -t,--ship <arg>                 Ship files in the specified directory (t for transfer)
     -tm,--taskManagerMemory <arg>   Memory per TaskManager Container with optional unit (default: MB)
     -yd,--yarndetached              If present, runs the job in detached mode (deprecated; use non-YARN specific option instead)
     -z,--zookeeperNamespace <arg>   Namespace to create the Zookeeper sub-paths for high availability mode

第一种方式提交flink程序:

首先我们启动yarn-session
./bin/yarn-session.sh -n 2 -s 2 -jm 2048 -tm 4096 -nm flink_session_cluster_20190320
yarn-session-方式1.png
然后我们在这个flink集群中提交任务,在我们提交任务的时候需要指定yid,yid就是我们上面开启集群所属的ID:application_1546585584446_0144,这个时候提交的任务就会到我们开启的集群中。
flink run \
-yid application_1546585584446_0144 \
-c com.hfjy.bigdata.ls.nginx.parsenginx.AliyunOnlineParseNginx /opt/jars  /online_aliyun_ls_parse_nginx_test.jar \
--output ${elasticsearch} \
--ipDbPath /opt/lib/ \
--windowSize 10
yarn-session-job.png

第二种方式提交flink程序

./bin/flink 提交任务所需参数:
"run" action options:
 -c,--class <classname>               Class with the program entry point
                                      ("main" method or "getPlan()" method.
                                      Only needed if the JAR file does not
                                      specify the class in its manifest.
 -C,--classpath <url>                 Adds a URL to each user code
                                      classloader  on all nodes in the
                                      cluster. The paths must specify a
                                      protocol (e.g. file://) and be
                                      accessible on all nodes (e.g. by means
                                      of a NFS share). You can use this
                                      option multiple times for specifying
                                      more than one URL. The protocol must
                                      be supported by the {@link
                                      java.net.URLClassLoader}.
 -d,--detached                        If present, runs the job in detached
                                      mode
 -n,--allowNonRestoredState           Allow to skip savepoint state that
                                      cannot be restored. You need to allow
                                      this if you removed an operator from
                                      your program that was part of the
                                      program when the savepoint was
                                      triggered.
 -p,--parallelism <parallelism>       The parallelism with which to run the
                                      program. Optional flag to override the
                                      default value specified in the
                                      configuration.
 -q,--sysoutLogging                   If present, suppress logging output to
                                      standard out.
 -s,--fromSavepoint <savepointPath>   Path to a savepoint to restore the job
                                      from (for example
                                      hdfs:///flink/savepoint-1537).
 -sae,--shutdownOnAttachedExit        If the job is submitted in attached
                                      mode, perform a best-effort cluster
                                      shutdown when the CLI is terminated
                                      abruptly, e.g., in response to a user
                                      interrupt, such as typing Ctrl + C.
Options for yarn-cluster mode:
 -d,--detached                        If present, runs the job in detached
                                      mode
 -m,--jobmanager <arg>                Address of the JobManager (master) to
                                      which to connect. Use this flag to
                                      connect to a different JobManager than
                                      the one specified in the
                                      configuration.
 -sae,--shutdownOnAttachedExit        If the job is submitted in attached
                                      mode, perform a best-effort cluster
                                      shutdown when the CLI is terminated
                                      abruptly, e.g., in response to a user
                                      interrupt, such as typing Ctrl + C.
 -yD <property=value>                 use value for given property
 -yd,--yarndetached                   If present, runs the job in detached
                                      mode (deprecated; use non-YARN
                                      specific option instead)
 -yh,--yarnhelp                       Help for the Yarn session CLI.
 -yid,--yarnapplicationId <arg>       Attach to running YARN session
 -yj,--yarnjar <arg>                  Path to Flink jar file
 -yjm,--yarnjobManagerMemory <arg>    Memory for JobManager Container with
                                      optional unit (default: MB)
 -yn,--yarncontainer <arg>            Number of YARN container to allocate
                                      (=Number of Task Managers)
 -ynl,--yarnnodeLabel <arg>           Specify YARN node label for the YARN
                                      application
 -ynm,--yarnname <arg>                Set a custom name for the application
                                      on YARN
 -yq,--yarnquery                      Display available YARN resources
                                      (memory, cores)
 -yqu,--yarnqueue <arg>               Specify YARN queue.
 -ys,--yarnslots <arg>                Number of slots per TaskManager
 -yst,--yarnstreaming                 Start Flink in streaming mode
 -yt,--yarnship <arg>                 Ship files in the specified directory
                                      (t for transfer)
 -ytm,--yarntaskManagerMemory <arg>   Memory per TaskManager Container with
                                      optional unit (default: MB)
 -yz,--yarnzookeeperNamespace <arg>   Namespace to create the Zookeeper
                                      sub-paths for high availability mode
 -z,--zookeeperNamespace <arg>        Namespace to create the Zookeeper
                                      sub-paths for high availability mode

Options for default mode:
 -m,--jobmanager <arg>           Address of the JobManager (master) to which
                                 to connect. Use this flag to connect to a
                                 different JobManager than the one specified
                                 in the configuration.
 -z,--zookeeperNamespace <arg>   Namespace to create the Zookeeper sub-paths
                                 for high availability mode
提交flink任务
flink run \
-m yarn-cluster \
-ynm AliyunNginxStudy2 \
-yn 1 \
-ys 3 \
-p 3 \
-yjm 2048m \
-ytm 8192m \
-c com.hfjy.bigdata.ls.nginx.parsenginx.AliyunOnlineParseNginx /opt/jars/online_aliyun_ls_parse_nginx_test.jar \
--output ${elasticsearch} \
--ipDbPath /opt/lib/ \
--windowSize 10
yarn-session-方式2.png
>>记录学习过程,文章中如有错误或不妥之处,请留言!<<
上一篇 下一篇

猜你喜欢

热点阅读