Yarn资源相关
1.Yarn上的角色
1.1.Client
客户端,接收作业请求。接收请求后向RM(Resource Manager)发起请求,让RM对这个作业生成一个Job ID。
1.2.Resource Manager
主节点,它负责管理整个集群的计算资源,并将这些资源分别给应用程序。
1.3.Node Manager
计算节点,根据相关的设置来启动容器的。NM(Node Manager)会定期向RM发送心跳信息来更新其健康状态。同时其也会监督Container的生命周期管理,监控每个Container的资源使用,管理日志和不同应用程序用到的附属服务。
1.4.Application Master
管理运行在Yarn上的应用程序,AM(ApplicationMaster)负责和RM的scheduler协商资源,并且和NM通信来运行相应的任务。RM 为 AM 分配容器,这些容器将会用来运行任务。AM 也会追踪应用程序的状态,监控容器的运行进度。
1.5.Container
容器,是YARN里面资源分配的基本单位,具有一定的内存以及CPU资源。容器授予 AM 使用特定主机的特定数量资源的权限。AM 也是在容器中运行的,其在应用程序分配的第一个容器中运行。
2.Yarn上任务流程
2.1.client接收到任务,client与RM(Resource Manager)发起请求,将这个任务赋予一个Job ID,将任务状态定义为New;
2.2.client继续将Job的详细信息提交给RM,RM将作业的详细信息保存,并且将Job的状态修改为Submit;
2.3.RM继续将作业信息提交给scheduler,scheduler会检查client的权限,并检查要运行AM(Application Master)是否有足够的资源,将Job的状态是Accept;
2.4.RM开始为要运行AM分配Container资源,并在Container上启动AM,修改Job的状态是Running;
2.5.AM启动成功后,开始与RM协调,并向RM申请要运行程序的资源,并定期检查状态;
2.6.Job按照预期完成,修改Job的状态为Finished。如果运行过程中出现故障,Job的状态为Failed。如果客户端主动kill掉作业,Job的状态为Killed;
3.Yarn上的资源管理
3.1.查看Yarn上的资源
以CDH为例
进入Yarn
image.png
点击Web UI --> Resource Manager Web UI
image.png
进入Yarn Web
image.png
可以查看到Yarn上可以使用的内存为141.52GB, 虚拟CPU为160个。
3.2.如何确定集群使用的资源
在Yarn的配置中可以搜索yarn.nodemanager.resource.memory-mb
可以看到每个节点可以使用的内存为35.38,Yarn一共有4个NM,Yarn可以使用的内存就为35.38 * 4 = 141.52。
这个配置是表示NM总共能够使用的物理内存,这也是可以给container使用的物理内存。
在配置页面中搜索yarn.nodemanager.resource.cpu-vcores
image.png
这个配置是表示NM总共能够使用的虚拟CPU数量,这也是可以给container使用的虚拟CPU数量。
3.2.查看scheduler调度资源
进入Yarn Web --> scheduler
image.png
可以看出目前Yarn上
3.2.1.Scheduler类型,Fair Scheduler;
调度器类型在CDH中提供三种,通过Yarn配置界面搜索yarn.resourcemanager.scheduler.class
提供了三种scheduler类型:
FairScheduler,公平调度器,设计目标是为所有的应用分配公平的资源。在FairScheduler中,我们不需要预先占用一定的系统资源,FairScheduler会为所有运行的job动态的调整系统资源;
FifoScheduler,先进先出调度器,FIfoScheduler把应用按提交的顺序排成一个队列,这是一个先进先出队列,在进行资源分配的时候,先给队列中最头上的应用进行分配资源,待最头上的应用需求满足后再给下一个分配,以此类推;
CapacityScheduler,容量调度器,有一个专门的队列用来运行小任务,但是为小任务专门设置一个队列会预先占用一定的集群资源,这就导致大任务的执行时间会落后于使用FifoScheduler时的时间;
3.2.2.最小分配2GB内存,1虚拟CPU;
3.2.3.最大分配32GB内存,40个虚拟CPU;
这里需要提出4个相关配置来定义以上配置:
配置项 | 说明 |
---|---|
yarn.scheduler.minimum-allocation-mb | 最小分配内存,如果请求的资源小于1G,也会设置为1G。 |
yarn.scheduler.maximum-allocation-mb | 最大分配的内存,如果比这个内存高,就会抛出InvalidResourceRequestException异常。 |
yarn.scheduler.minimum-allocation-vcores | 最小分配虚拟CPU |
yarn.scheduler.maximum-allocation-vcores | 最大分配虚拟CPU |
4.Spark on Yarn
在日常生产环境中,将spark程序提交到Yarn上运行。
4.1.两种运行模式
1.Cluster模式,Driver运行在Application Master里面的;
2.Client模式,Driver就运行在提交spark程序的地方;
4.2.启动相关参数
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn,
k8s://https://host:port, or local (Default: local[*]).
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of jars to include on the driver
and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
--exclude-packages Comma-separated list of groupId:artifactId, to exclude while
resolving the dependencies provided in --packages to avoid
dependency conflicts.
--repositories Comma-separated list of additional remote repositories to
search for the maven coordinates given with --packages.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be placed in the working
directory of each executor. File paths of these files
in executors can be accessed via SparkFiles.get(fileName).
--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).
--driver-java-options Extra Java options to pass to the driver.
--driver-library-path Extra library path entries to pass to the driver.
--driver-class-path Extra class path entries to pass to the driver. Note that
jars added with --jars are automatically included in the
classpath.
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).
--proxy-user NAME User to impersonate when submitting the application.
This argument does not work with --principal / --keytab.
--help, -h Show this help message and exit.
--verbose, -v Print additional debug output.
--version, Print the version of current Spark.
Cluster deploy mode only:
--driver-cores NUM Number of cores used by the driver, only in cluster mode
(Default: 1).
Spark standalone or Mesos with cluster deploy mode only:
--supervise If given, restarts the driver on failure.
--kill SUBMISSION_ID If given, kills the driver specified.
--status SUBMISSION_ID If given, requests the status of the driver specified.
Spark standalone and Mesos only:
--total-executor-cores NUM Total cores for all executors.
Spark standalone and YARN only:
--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,
or all available cores on the worker in standalone mode)
YARN-only:
--queue QUEUE_NAME The YARN queue to submit to (Default: "default").
--num-executors NUM Number of executors to launch (Default: 2).
If dynamic allocation is enabled, the initial number of
executors will be at least NUM.
--archives ARCHIVES Comma separated list of archives to be extracted into the
working directory of each executor.
--principal PRINCIPAL Principal to be used to login to KDC, while running on
secure HDFS.
--keytab KEYTAB The full path to the file that contains the keytab for the
principal specified above. This keytab will be copied to
the node running the Application Master via the Secure
Distributed Cache, for renewing the login tickets and the
delegation tokens periodically.
常用到的参数:
--master 运行模式,有local、yarn、spark://host:port、mesos://host:port、k8s://https://host:port;
--deploy-mode 驱动模式,有cluster、client两种;
--num-executors 启动spark中设置executor的数量;
--executor-cores 每一个executor中虚拟CPU数量;
--executor-memory 每一个executor的内存大小;
--driver-memory driver端内存大小;
--driver-cores driver端虚拟CPU数量;
--jars 启动jar包需要的依赖jar包,绝对路径,逗号分隔;
--class 启动类;