Spark2x on yarn日志配置详解

2019-12-18  本文已影响0人  super_wing

概述

Spark on Yarn的日志配置分为两类:

  1. Spark on Yarn client模式
  2. Spark on Yarn cluster模式

接下为大家逐一介绍。

Spark on Yarn client模式下的日志配置

在client模式下,Spark分为三部分,分别是
driver,application master以及executor,这种模式通常使用在测试环境中。

基于以上的讲解,来看一下其日志的配置:

 spark-submit \
 --class com.hm.spark.Application \
 --master yarn \
 --deploy-mode cluster \
 // client模式下driver端日志
 --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
 // 将本地文件上传到container中
 --files /home/hadoop/spark-workspace/log4j.properties \
 /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
 spark-submit \
 --class com.hm.spark.Application \
 --master yarn \
 --deploy-mode cluster \
 // application master端日志
 --conf "spark.yarn.am.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
 // 将本地文件上传到container中
 --files /home/hadoop/spark-workspace/log4j.properties \
 /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
 spark-submit \
 --class com.hm.spark.Application \
 --master yarn \
 --deploy-mode cluster \
 // executor端日志
 --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
 // 将本地文件上传到container中
 --files /home/hadoop/spark-workspace/log4j.properties \
 /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar

Spark on Yarn cluster模式下的日志配置

在cluster模式下,Spark分为两部分,分别是
driver和executor,通常应用在生产环境。

基于以上的讲解,来看一下其日志的配置:

spark-submit \
 --class com.hm.spark.Application \
 --master yarn \
 --deploy-mode cluster \
 // yarn cluster driver端日志
 --conf "spark.yarn.cluster.driver.extraJavaOption=-Dlog4j.configuration=log4j.properties" \
 // 将本地文件上传到container中
 --files /home/hadoop/spark-workspace/log4j.properties \
 /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar
 spark-submit \
 --class com.hm.spark.Application \
 --master yarn \
 --deploy-mode cluster \
 // executor端日志
 --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" \
 // 将本地文件上传到container中
 --files /home/hadoop/spark-workspace/log4j.properties \
 /home/hadoop/spark-workspace/my-spark-etl-assembly-1.0-SNAPSHOT.jar

具体的日志文件内容

在client模式下,driver日志配置模板为:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

这里使用控制台输出可以在driver更加方便的查看日志。

其它日志配置

log4j.rootLogger=INFO,rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.File=${log}/abc.log
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%d] %p %m (%c)%n
log4j.appender.rolling.maxFileSize=2KB
log4j.appender.rolling.maxBackupIndex=10

这里建议使用appender,从而防止日志过大把磁盘撑爆。

上一篇下一篇

猜你喜欢

热点阅读