spark sql 执行各种报错

2019-10-25  本文已影响0人  邵红晓

调试目的 通过
1、存在数据倾斜
2、spark sql 执行过程中,重试次数太多

日志1

19/10/25 16:29:39 INFO TaskSetManager: Task 27.1 in stage 9.4 (TID 3548) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 16:29:39 WARN TaskSetManager: Lost task 61.1 in stage 9.4 (TID 3562, shyt-hadoop-4019.*.com.cn): FetchFailed(null, shuffleId=2, mapId=-1, reduceId=194, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 2
 at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:548)
 at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:544)
 at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
 at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
 at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:544)
 at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
 at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
 at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:166)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
 at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:89)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:748)

)
19/10/25 16:29:39 INFO TaskSetManager: Task 61.1 in stage 9.4 (TID 3562) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 16:29:39 INFO DAGScheduler: Resubmitting failed stages
19/10/25 16:29:39 INFO DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[26] at sql at Etl_dw_app.scala:179), which has no missing parents
19/10/25 16:29:39 INFO MemoryStore: Block broadcast_52 stored as values in memory (estimated size 21.8 KB, free 11.3 GB)
19/10/25 16:29:39 INFO MemoryStore: Block broadcast_52_piece0 stored as bytes in memory (estimated size 9.0 KB, free 11.3 GB)

日志2

java.io.IOException: Connection reset by peer
 at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
 at sun.nio.ch.IOUtil.read(IOUtil.java:192)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
 at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
 at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
 at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at java.lang.Thread.run(Thread.java:748)

日志3
spark-submit --master yarn-client --class Etl_dw_app --driver-memory 16g --executor-memory 8G --executor-cores 4 --num-executors 80 --conf spark.port.maxRetries=100 --conf spark.sql.shuffle.partitions=800 --conf spark.default.parallelism=960 /export6/home/*data/HadoopCommit29/log/dw_app_ana/Etl_dw_app.jar20190926

ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 8.9 GB of 8.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
org.apache.spark.shuffle.FetchFailedException: Failed to connect to shyt-hadoop-4020.*.com.cn/10.32.40.20:8339
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to shyt-hadoop-4020.*.com.cn/10.32.40.20:8339
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:168)
    at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:141)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
    at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:171)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more
Caused by: java.net.ConnectException: Connection refused: shyt-hadoop-4020.*.com.cn/10.32.40.20:8339
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more
    
    
    
    
19/10/25 17:26:29 INFO TaskSetManager: Task 0.1 in stage 9.3 (TID 3560) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 17:26:29 INFO DAGScheduler: Job 3 failed: sql at Etl_dw_app.scala:179, took 56.952067 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: ShuffleMapStage 9 (sql at Etl_dw_app.scala:179) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 2
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:548)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:544)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:544)
    at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
    at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:166)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1260)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1639)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1831)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1844)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
    at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:150)
    at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268)
    at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:193)
    at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:352)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
    at Etl_dw_app$.writePartitionTable(Etl_dw_app.scala:179)
    at Etl_dw_app$.main(Etl_dw_app.scala:41)
    at Etl_dw_app.main(Etl_dw_app.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:750)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

日志4
spark-submit --master yarn-client --class Etl_dw_app --driver-memory 16g --executor-memory 8G --executor-cores 4 --num-executors 60 --conf spark.port.maxRetries=100 --conf spark.sql.shuffle.partitions=720 --conf spark.default.parallelism=720 /export6/home/*data/HadoopCommit29/log/dw_app_ana/Etl_dw_app.jar20190926



19/10/25 17:31:19 INFO TaskSetManager: Finished task 605.0 in stage 1.0 (TID 746) in 1852 ms on shyt-hadoop-4022.*.com.cn (640/720)
19/10/25 17:31:19 WARN TaskSetManager: Lost task 688.0 in stage 1.0 (TID 829, shyt-hadoop-4014.*.com.cn): java.io.IOException: Failed to create local dir in /export2/hadoop/yarn/local/usercache/*data/appcache/application_1571902955229_1956/blockmgr-2eb7abc4-c00f-45e5-b7be-fdf751a8f0e3/23.
    at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:73)
    at org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:83)
    at org.apache.spark.shuffle.IndexShuffleBlockResolver.getDataFile(IndexShuffleBlockResolver.scala:53)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:69)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

19/10/25 17:31:19 INFO TaskSetManager: Starting task 688.1 in stage 1.0 (TID 861, shyt-hadoop-4011.*.com.cn, partition 688,PROCESS_LOCAL, 2245 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 618.0 in stage 1.0 (TID 759) in 1754 ms on shyt-hadoop-4030.*.com.cn (641/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 658.0 in stage 1.0 (TID 799) in 1305 ms on shyt-hadoop-4012.*.com.cn (642/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 656.0 in stage 1.0 (TID 797) in 1336 ms on shyt-hadoop-4020.*.com.cn (643/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 654.0 in stage 1.0 (TID 795) in 1393 ms on shyt-hadoop-4021.*.com.cn (644/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 624.0 in stage 1.0 (TID 765) in 1775 ms on shyt-hadoop-4030.*.com.cn (645/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 660.0 in stage 1.0 (TID 801) in 1340 ms on shyt-hadoop-4027.*.com.cn (646/720)
19/10/25 17:31:19 WARN TaskSetManager: Lost task 688.1 in stage 1.0 (TID 861, shyt-hadoop-4011.*.com.cn): FetchFailed(BlockManagerId(13, shyt-hadoop-4014.*.com.cn, 10065), shuffleId=0, mapId=27, reduceId=688, message=
org.apache.spark.shuffle.FetchFailedException: java.io.FileNotFoundException: /export4/hadoop/yarn/local/usercache/*data/appcache/application_1571902955229_1956/blockmgr-d33a6e24-7b64-42b8-84b8-857c5e842344/21/shuffle_0_27_0.index (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:197)
    at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:298)
    at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)
    at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
    at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)
    at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:748)

    at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /export4/hadoop/yarn/local/usercache/*data/appcache/application_1571902955229_1956/blockmgr-d33a6e24-7b64-42b8-84b8-857c5e842344/21/shuffle_0_27_0.index (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:197)
    at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:298)
    at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)
    at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:58)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
    at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:58)
    at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:748)

    at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:186)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:106)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more

)
19/10/25 17:31:19 INFO TaskSetManager: Task 688.1 in stage 1.0 (TID 861) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 17:31:19 INFO DAGScheduler: Marking ShuffleMapStage 1 (map at Etl_dw_app.scala:54) as failed due to a fetch failure from ShuffleMapStage 0 (map at Etl_dw_app.scala:54)
19/10/25 17:31:19 INFO DAGScheduler: ShuffleMapStage 1 (map at Etl_dw_app.scala:54) failed in 8.274 s
19/10/25 17:31:19 INFO DAGScheduler: Resubmitting ShuffleMapStage 0 (map at Etl_dw_app.scala:54) and ShuffleMapStage 1 (map at Etl_dw_app.scala:54) due to fetch failure
19/10/25 17:31:19 INFO DAGScheduler: Executor lost: 13 (epoch 1)
19/10/25 17:31:19 INFO BlockManagerMasterEndpoint: Trying to remove executor 13 from BlockManagerMaster.
19/10/25 17:31:19 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(13, shyt-hadoop-4014.*.com.cn, 10065)
19/10/25 17:31:19 INFO BlockManagerMaster: Removed 13 successfully in removeExecutor
19/10/25 17:31:19 INFO ShuffleMapStage: ShuffleMapStage 1 is now unavailable on executor 13 (622/720, false)
19/10/25 17:31:19 INFO ShuffleMapStage: ShuffleMapStage 0 is now unavailable on executor 13 (133/141, false)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 684.0 in stage 1.0 (TID 825) in 997 ms on shyt-hadoop-4016.*.com.cn (648/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 647.0 in stage 1.0 (TID 788) in 1566 ms on shyt-hadoop-4025.*.com.cn (649/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 649.0 in stage 1.0 (TID 790) in 1563 ms on shyt-hadoop-4023.*.com.cn (650/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 634.0 in stage 1.0 (TID 775) in 1782 ms on shyt-hadoop-4024.*.com.cn (651/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 606.0 in stage 1.0 (TID 747) in 1979 ms on shyt-hadoop-4025.*.com.cn (652/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 699.0 in stage 1.0 (TID 840) in 874 ms on shyt-hadoop-4029.*.com.cn (653/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 716.0 in stage 1.0 (TID 857) in 629 ms on shyt-hadoop-4017.*.com.cn (654/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 632.0 in stage 1.0 (TID 773) in 1832 ms on shyt-hadoop-4030.*.com.cn (655/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 554.0 in stage 1.0 (TID 695) in 2450 ms on shyt-hadoop-4024.*.com.cn (656/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 642.0 in stage 1.0 (TID 783) in 1691 ms on shyt-hadoop-4021.*.com.cn (657/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 662.0 in stage 1.0 (TID 803) in 1399 ms on shyt-hadoop-4019.*.com.cn (658/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 673.0 in stage 1.0 (TID 814) in 1281 ms on shyt-hadoop-4011.*.com.cn (659/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 657.0 in stage 1.0 (TID 798) in 1493 ms on shyt-hadoop-4026.*.com.cn (660/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 663.0 in stage 1.0 (TID 804) in 1412 ms on shyt-hadoop-4025.*.com.cn (661/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 652.0 in stage 1.0 (TID 793) in 1639 ms on shyt-hadoop-4030.*.com.cn (662/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 685.0 in stage 1.0 (TID 826) in 1132 ms on shyt-hadoop-4029.*.com.cn (663/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 686.0 in stage 1.0 (TID 827) in 1135 ms on shyt-hadoop-4015.*.com.cn (664/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 669.0 in stage 1.0 (TID 810) in 1381 ms on shyt-hadoop-4023.*.com.cn (665/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 698.0 in stage 1.0 (TID 839) in 1021 ms on shyt-hadoop-4011.*.com.cn (666/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 627.0 in stage 1.0 (TID 768) in 1982 ms on shyt-hadoop-4024.*.com.cn (667/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 711.0 in stage 1.0 (TID 852) in 839 ms on shyt-hadoop-4019.*.com.cn (668/720)
19/10/25 17:31:19 INFO DAGScheduler: Resubmitting failed stages
19/10/25 17:31:19 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[4] at map at Etl_dw_app.scala:54), which has no missing parents
19/10/25 17:31:19 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 19.2 KB, free 11.3 GB)
19/10/25 17:31:19 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 8.3 KB, free 11.3 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.32.40.27:11129 (size: 8.3 KB, free: 11.3 GB)
19/10/25 17:31:19 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1008
19/10/25 17:31:19 INFO DAGScheduler: Submitting 8 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[4] at map at Etl_dw_app.scala:54)
19/10/25 17:31:19 INFO YarnScheduler: Adding task set 0.1 with 8 tasks
19/10/25 17:31:19 INFO TaskSetManager: Starting task 7.0 in stage 0.1 (TID 862, shyt-hadoop-4021.*.com.cn, partition 123,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 2.0 in stage 0.1 (TID 863, shyt-hadoop-4025.*.com.cn, partition 102,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 1.0 in stage 0.1 (TID 864, shyt-hadoop-4011.*.com.cn, partition 93,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 0.0 in stage 0.1 (TID 865, shyt-hadoop-4014.*.com.cn, partition 27,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 5.0 in stage 0.1 (TID 866, shyt-hadoop-4022.*.com.cn, partition 120,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 4.0 in stage 0.1 (TID 867, shyt-hadoop-4016.*.com.cn, partition 118,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 6.0 in stage 0.1 (TID 868, shyt-hadoop-4016.*.com.cn, partition 121,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Starting task 3.0 in stage 0.1 (TID 869, shyt-hadoop-4025.*.com.cn, partition 112,NODE_LOCAL, 2457 bytes)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 690.0 in stage 1.0 (TID 831) in 1121 ms on shyt-hadoop-4015.*.com.cn (669/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 703.0 in stage 1.0 (TID 844) in 946 ms on shyt-hadoop-4012.*.com.cn (670/720)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4021.*.com.cn:25221 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4022.*.com.cn:8199 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4016.*.com.cn:30815 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4011.*.com.cn:20572 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4025.*.com.cn:33244 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4016.*.com.cn:31181 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on shyt-hadoop-4025.*.com.cn:16228 (size: 8.3 KB, free: 5.5 GB)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 701.0 in stage 1.0 (TID 842) in 1001 ms on shyt-hadoop-4013.*.com.cn (671/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 680.0 in stage 1.0 (TID 821) in 1324 ms on shyt-hadoop-4012.*.com.cn (672/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 676.0 in stage 1.0 (TID 817) in 1436 ms on shyt-hadoop-4015.*.com.cn (673/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 706.0 in stage 1.0 (TID 847) in 984 ms on shyt-hadoop-4026.*.com.cn (674/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 661.0 in stage 1.0 (TID 802) in 1643 ms on shyt-hadoop-4023.*.com.cn (675/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 689.0 in stage 1.0 (TID 830) in 1230 ms on shyt-hadoop-4029.*.com.cn (676/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 705.0 in stage 1.0 (TID 846) in 1056 ms on shyt-hadoop-4013.*.com.cn (677/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 713.0 in stage 1.0 (TID 854) in 960 ms on shyt-hadoop-4013.*.com.cn (678/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 717.0 in stage 1.0 (TID 858) in 921 ms on shyt-hadoop-4012.*.com.cn (679/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 587.0 in stage 1.0 (TID 728) in 2453 ms on shyt-hadoop-4018.*.com.cn (680/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 715.0 in stage 1.0 (TID 856) in 940 ms on shyt-hadoop-4011.*.com.cn (681/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 668.0 in stage 1.0 (TID 809) in 1611 ms on shyt-hadoop-4021.*.com.cn (682/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 670.0 in stage 1.0 (TID 811) in 1587 ms on shyt-hadoop-4025.*.com.cn (683/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 695.0 in stage 1.0 (TID 836) in 1240 ms on shyt-hadoop-4026.*.com.cn (684/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 707.0 in stage 1.0 (TID 848) in 1083 ms on shyt-hadoop-4029.*.com.cn (685/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 692.0 in stage 1.0 (TID 833) in 1275 ms on shyt-hadoop-4027.*.com.cn (686/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 693.0 in stage 1.0 (TID 834) in 1279 ms on shyt-hadoop-4011.*.com.cn (687/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 644.0 in stage 1.0 (TID 785) in 2013 ms on shyt-hadoop-4023.*.com.cn (688/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 682.0 in stage 1.0 (TID 823) in 1464 ms on shyt-hadoop-4019.*.com.cn (689/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 645.0 in stage 1.0 (TID 786) in 2021 ms on shyt-hadoop-4022.*.com.cn (690/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 709.0 in stage 1.0 (TID 850) in 1124 ms on shyt-hadoop-4025.*.com.cn (691/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 708.0 in stage 1.0 (TID 849) in 1136 ms on shyt-hadoop-4012.*.com.cn (692/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 696.0 in stage 1.0 (TID 837) in 1324 ms on shyt-hadoop-4029.*.com.cn (693/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 641.0 in stage 1.0 (TID 782) in 2095 ms on shyt-hadoop-4030.*.com.cn (694/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 718.0 in stage 1.0 (TID 859) in 1070 ms on shyt-hadoop-4018.*.com.cn (695/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 667.0 in stage 1.0 (TID 808) in 1779 ms on shyt-hadoop-4026.*.com.cn (696/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 674.0 in stage 1.0 (TID 815) in 1715 ms on shyt-hadoop-4019.*.com.cn (697/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 719.0 in stage 1.0 (TID 860) in 1082 ms on shyt-hadoop-4018.*.com.cn (698/720)
19/10/25 17:31:19 INFO TaskSetManager: Finished task 714.0 in stage 1.0 (TID 855) in 1122 ms on shyt-hadoop-4021.*.com.cn (699/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 694.0 in stage 1.0 (TID 835) in 1423 ms on shyt-hadoop-4026.*.com.cn (700/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 650.0 in stage 1.0 (TID 791) in 2102 ms on shyt-hadoop-4021.*.com.cn (701/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 683.0 in stage 1.0 (TID 824) in 1643 ms on shyt-hadoop-4022.*.com.cn (702/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 666.0 in stage 1.0 (TID 807) in 1890 ms on shyt-hadoop-4027.*.com.cn (703/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 704.0 in stage 1.0 (TID 845) in 1331 ms on shyt-hadoop-4011.*.com.cn (704/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 603.0 in stage 1.0 (TID 744) in 2578 ms on shyt-hadoop-4018.*.com.cn (705/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 671.0 in stage 1.0 (TID 812) in 1839 ms on shyt-hadoop-4023.*.com.cn (706/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 710.0 in stage 1.0 (TID 851) in 1291 ms on shyt-hadoop-4029.*.com.cn (707/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 664.0 in stage 1.0 (TID 805) in 1950 ms on shyt-hadoop-4018.*.com.cn (708/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 659.0 in stage 1.0 (TID 800) in 2045 ms on shyt-hadoop-4018.*.com.cn (709/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 697.0 in stage 1.0 (TID 838) in 1523 ms on shyt-hadoop-4026.*.com.cn (710/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 679.0 in stage 1.0 (TID 820) in 1746 ms on shyt-hadoop-4026.*.com.cn (711/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 653.0 in stage 1.0 (TID 794) in 2139 ms on shyt-hadoop-4030.*.com.cn (712/720)
19/10/25 17:31:20 WARN TransportChannelHandler: Exception in connection from shyt-hadoop-4014.*.com.cn/10.32.40.14:15610
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:748)
19/10/25 17:31:20 INFO YarnClientSchedulerBackend: Disabling executor 13.
19/10/25 17:31:20 INFO DAGScheduler: Executor lost: 13 (epoch 4)
19/10/25 17:31:20 INFO BlockManagerMasterEndpoint: Trying to remove executor 13 from BlockManagerMaster.
19/10/25 17:31:20 INFO BlockManagerMaster: Removed 13 successfully in removeExecutor
19/10/25 17:31:20 INFO TaskSetManager: Finished task 678.0 in stage 1.0 (TID 819) in 1813 ms on shyt-hadoop-4021.*.com.cn (713/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 681.0 in stage 1.0 (TID 822) in 1812 ms on shyt-hadoop-4021.*.com.cn (714/720)
19/10/25 17:31:20 INFO TaskSetManager: Finished task 700.0 in stage 1.0 (TID 841) in 1576 ms on shyt-hadoop-4025.*.com.cn (715/720)
19/10/25 17:31:20 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 8.6 GB of 8.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
19/10/25 17:31:20 ERROR YarnScheduler: Lost executor 13 on shyt-hadoop-4014.*.com.cn: Container killed by YARN for exceeding memory limits. 8.6 GB of 8.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
19/10/25 17:31:20 INFO DAGScheduler: Resubmitted ShuffleMapTask(1, 364), so marking it as still running
19/10/25 17:31:20 INFO DAGScheduler: Resubmitted ShuffleMapTask(1, 262), so marking it as still running
1

日志5
spark-submit --master yarn-client --class Etl_dw_app --driver-memory 16g --executor-memory 16G --executor-cores 4 --num-executors 60--conf spark.port.maxRetries=100 --conf spark.sql.shuffle.partitions=480 --conf spark.default.parallelism=480 /export6/home/*data/HadoopCommit29/log/dw_app_ana/Etl_dw_app.jar 20190926

spark-submit --master yarn-client --class Etl_dw_app --driver-memory 16g                            --executor-memory 16G --executor-cores 4 --num-executors 60--conf spark.port.maxRetries=100       --conf spark.sql.shuffle.partitions=480      --conf spark.default.parallelism=480  /export6/home/*data/HadoopCommit29/log/dw_app_ana/Etl_dw_app.jar 20190926
/10/25 17:38:45 INFO TaskSetManager: Task 199.1 in stage 9.0 (TID 3092) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 17:38:45 INFO DAGScheduler: Resubmitting ShuffleMapStage 6 (sql at Etl_dw_app.scala:179) and ShuffleMapStage 9 (sql at Etl_dw_app.scala:179) due to fetch failure
19/10/25 17:38:45 WARN TaskSetManager: Lost task 200.1 in stage 9.0 (TID 3091, shyt-hadoop-4011.*.com.cn): FetchFailed(null, shuffleId=2, mapId=-1, reduceId=200, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 2
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:548)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:544)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:544)
    at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
    at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:166)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

)
19/10/25 17:38:45 INFO TaskSetManager: Task 200.1 in stage 9.0 (TID 3091) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 17:38:45 INFO YarnScheduler: Removed TaskSet 9.0, whose tasks have all completed, from pool
19/10/25 17:38:45 WARN TaskSetManager: Lost task 179.1 in stage 9.0 (TID 3089, shyt-hadoop-4011.*.com.cn): FetchFailed(null, shuffleId=2, mapId=-1, reduceId=179, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 2
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:548)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:544)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:544)
    at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
    at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:166)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

)
19/10/25 17:38:45 INFO TaskSetManager: Task 179.1 in stage 9.0 (TID 3089) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 17:38:45 INFO YarnScheduler: Removed TaskSet 9.0, whose tasks have all completed, from pool
19/10/25 17:38:45 WARN TaskSetManager: Lost task 161.1 in stage 9.0 (TID 3090, shyt-hadoop-4011.*.com.cn): FetchFailed(null, shuffleId=2, mapId=-1, reduceId=161, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 2
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:548)
    at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:544)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
    at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:544)
    at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
    at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
    at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:166)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

)
19/10/25 17:38:45 INFO TaskSetManager: Task 161.1 in stage 9.0 (TID 3090) failed, but another instance of the task has already succeeded, so not re-queuing the task to be re-executed.
19/10/25 17:38:45 INFO YarnScheduler: Removed TaskSet 9.0, whose tasks have all completed, from pool
19/10/25 17:38:45 INFO TaskSetManager: Finished task 0.0 in stage 12.1 (TID 3082) in 735 ms on shyt-hadoop-4023.*.com.cn (3/3)
19/10/25 17:38:45 INFO DAGScheduler: ShuffleMapStage 12 (sql at Etl_dw_app.scala:179) finished in 0.736 s
19/10/25 17:38:45 INFO YarnScheduler: Removed TaskSet 12.1, whose tasks have all completed, from pool
19/10/25 17:38:45 INFO DAGScheduler: looking for newly runnable stages
19/10/25 17:38:45 INFO DAGScheduler: running: Set()
19/10/25 17:38:45 INFO DAGScheduler: waiting: Set(ShuffleMapStage 15, ResultStage 16, ShuffleMapStage 13, ShuffleMapStage 11)
19/10/25 17:38:45 INFO DAGScheduler: failed: Set(ShuffleMapStage 9, ShuffleMapStage 6)
19/10/25 17:38:45 INFO DAGScheduler: Resubmitting failed stages

日志6

Job aborted due to stage failure: Task 7 in stage 22.0 failed 4 times, most recent failure: Lost task 7.3 in stage 22.0 (TID 639, shyt-hadoop-4024.
.com.cn): java.lang.RuntimeException: Exchange not implemented for UnknownPartitioning(2)
    at scala.sys.package$.error(package.scala:27)
    at org.apache.spark.sql.execution.Exchange.org$apache$spark$sql$execution$Exchange$$getPartitionKeyExtractor$1(Exchange.scala:199)
    at org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:209)
    at org.apache.spark.sql.execution.Exchange$$anonfun$3.apply(Exchange.scala:208)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
    org.apache.spark.shuffle.FetchFailedException: Too large frame: 5647438305 +details
org.apache.spark.shuffle.FetchFailedException: Too large frame: 5647438305
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
    at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
    at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
    at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:735)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:247)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Too large frame: 5647438305
    at org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
    at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:134)
    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    ... 1 more

原因分析:

shuffle 分为shuffle write,shuffle read两部分。
shuffle 写的分区数由上一阶段的RDD分区数控制,随机读取的分区数则是由spark提供的一些参数控制。
shuffle可以简单地理解为替代saveAsLocalDiskFile的操作,将计算的中间结果按某种规则临时放到各个执行者所在的本地磁盘上。

shuffle读取的时候数据的分区数则是由spark提供的一些参数控制。可以想到的是,如果这个参数值设置的很小,同时shuffle读取的量很大,那么将会导致一个任务需要处理的数据非常大。结果导致JVM崩溃,从而导致取回shuffle数据失败,同时执行者也丢失了,看到Failed to connect to host的错误,也就是executor lost的意思。

解决办法

知道原因后问题就好解决了,主要从shuffle的数据量和处理shuffle数据的分区数两个角度入手。

SELECT day, SUM(cnt)
FROM (
    SELECT day, COUNT(DISTINCT user_id) as cnt
    FROM T
    GROUP BY day, MOD(HASH_CODE(user_id), 1024)
)
GROUP BY day

在spark 3.0 中,这些问题已经不是问题啦

上一篇下一篇

猜你喜欢

热点阅读