磁盘io问题引发的Flink集群血案

2025-05-07  本文已影响0人  后来丶_a24d

目录


背景


过程

2025-05-07 22:38:04,382 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job xxx(xxxx) switched from state RUNNING to FAILING.
java.util.concurrent.TimeoutException: Heartbeat of TaskManager with id container_xxxxxxxxx timed out.
        at org.apache.flink.runtime.jobmaster.JobMaster$TaskManagerHeartbeatListener.notifyHeartbeatTimeout(JobMaster.java:1656)
        at org.apache.flink.runtime.heartbeat.HeartbeatManagerImpl$HeartbeatMonitor.run(HeartbeatManagerImpl.java:339)

结论

 grep "May  7 22:" /var/log/messages
超时.png
ioutil.png

解决方案

上一篇 下一篇

猜你喜欢

热点阅读