flink

Flink异常指南

2018-11-02  本文已影响541人  远o_O
        env.enableCheckpointing(1000L)
        val checkpointConf = env.getCheckpointConfig
        checkpointConf.setMinPauseBetweenCheckpoints(30000L)
        checkpointConf.setCheckpointTimeout(8000L)

原因是因为checkpointConf.setCheckpointTimeout(8000L)设置的太小了,默认是10min,这里只设置了8sec。当一个Flink App背压的时候(例如由外部组件异常引起),Barrier会流动的非常缓慢,导致Checkpoint时长飙升。

org.apache.flink.util.FlinkException: The assigned slot container_e08_1539148828017_15937_01_003564_0 was removed.
    at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlot(SlotManager.java:786)
    at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.removeSlots(SlotManager.java:756)
    at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.internalUnregisterTaskManager(SlotManager.java:948)
    at org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager.unregisterTaskManager(SlotManager.java:372)
    at org.apache.flink.runtime.resourcemanager.ResourceManager.closeTaskManagerConnection(ResourceManager.java:803)
    at org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:340)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
    at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
    at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.ChannelException: failed to open a new selector
    at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:156)
    at io.netty.channel.nio.NioEventLoop.<init>(NioEventLoop.java:147)
    at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:126)
    at io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:36)
    at io.netty.util.concurrent.MultithreadEventExecutorGroup.<init>(MultithreadEventExecutorGroup.java:84)
    ... 21 more
Caused by: java.io.IOException: Too many open files
    at sun.nio.ch.IOUtil.makePipe(Native Method)
    at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:65)
    at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
    at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:154)
    ... 25 more
上一篇下一篇

猜你喜欢

热点阅读