NameNode Full GC导致集群退出问题剖析，以及重启j

2021-09-02 本文已影响0人邵红晓

元数据流程图

问题1

在AsyncLogger异步flush写JournalNode过程中，出现IOException，而这个IOException的依赖判断是基于系统时间，获取系统时间之前发生Full GC，导致timeout，terminate namenode直接退出。

分析调用链
JournalSet.flushAndSync.mapJournalsAndReportErrors.flushAndSync()
->实现QuorumOutputStream.flushAndSync()->
loggers.waitForWriteQuorum(qcall, writeTimeoutMs, "sendEdits");->

  q.waitFor(
          loggers.size(), // either all respond   5
          majority, // or we get a majority successes   3
          majority, // or we get a majority failures, 3
          timeoutMs, operationName);// timeoutMs 20秒

->timeoutMs 20秒等待journal node 20s 请求结果必须返回，否则就是超时
QuorumCall.waitFor()

public synchronized void waitFor(
      int minResponses, int minSuccesses, int maxExceptions,
      int millis, String operationName)
      throws InterruptedException, TimeoutException {
    // millis = 20s，必须在20s内方法返回，要不然就是有2个journalnodes写入成功了，要不然就是写入失败
    long st = Time.monotonicNow();
    long nextLogTime = st + (long)(millis * WAIT_PROGRESS_INFO_THRESHOLD);
    /**
     * 计算超时时间点
     */
    long et = st + millis;
    while (true) {//多次尝试的意思     
      checkAssertionErrors();
      /**
       *
       * 思路：
       * 写journalnode 可以，无论写成功和失败，你都跟我个回应。
       *
       */
      // 第一种情况：写入成功的 + 写入失败的 >= 5，5个journalnodes不管是成功还是失败，都给你返回了一个响应
      //countResponses用来统计 journalnode返回来响应点个数
      if (minResponses > 0 && countResponses() >= minResponses) return;
      // 第二种情况：写入成功的 >= 3，只要有3个journalnodes是写入成功的，也立马返回
      if (minSuccesses > 0 && countSuccesses() >= minSuccesses) return;
      // 第三种情况：写入失败的 >=3，只要有3个journalnodes是写入失败的，也可以立马返回
      if (maxExceptions >= 0 && countExceptions() > maxExceptions) return;
      long now = Time.monotonicNow();
      if (now > nextLogTime) {
        long waited = now - st;
        //打印日志
        String msg = String.format(
            "Waited %s ms (timeout=%s ms) for a response for %s",
            waited, millis, operationName);
        if (!successes.isEmpty()) {
          msg += ". Succeeded so far: [" + Joiner.on(",").join(successes.keySet()) + "]";
        }
        if (!exceptions.isEmpty()) {
          msg += ". Exceptions so far: [" + getExceptionMapString() + "]";
        }
        if (successes.isEmpty() && exceptions.isEmpty()) {
          msg += ". No responses yet.";
        }
        //如果等待的时间超过了
        if (waited > millis * WAIT_PROGRESS_WARN_THRESHOLD) {
          QuorumJournalManager.LOG.warn(msg);//warn
        } else {
          //如果没有超过，那么就打印info日志
          QuorumJournalManager.LOG.info(msg);
        }
        //重新计算下一次日志打印的时间：16:00:07 + 1 =16:00:08
        nextLogTime = now + WAIT_PROGRESS_INTERVAL_MILLIS;
      }
      long rem = et - now;
      /**
       * 说明已经超时了重点代码
       */
      if(rem <= 0){ 
        throw  new TimeoutException();
      }
      rem = Math.min(rem, nextLogTime - now);
      rem = Math.max(rem, 1);
      wait(rem);//5
    } 
  }

JournalSet.mapJournalsAndReportError
 for (JournalAndStream jas : journals) {//
      try {
          //FileJouanlManager
        closure.apply(jas);
      } catch (Throwable t) {
        if (jas.isRequired()) {
          final String msg = "Error: " + status + " failed for required journal ("
            + jas + ")";
          LOG.fatal(msg, t);
          // If we fail on *any* of the required journals, then we must not
          // continue on any of the other journals. Abort them to ensure that
          // retry behavior doesn't allow them to keep going in any way.
          abortAllJournals();
          // the current policy is to shutdown the NN on errors to shared edits
          // dir. There are many code paths to shared edits failures - syncs,
          // roll of edits etc. All of them go through this common function 
          // where the isRequired() check is made. Applying exit policy here 
          // to catch all code paths.
          terminate(1, msg);
        } else {
          LOG.error("Error: " + status + " failed for (journal " + jas + ")", t);
          badJAS.add(jas);          
        }
      }
    }

总结

通过源码分析
1、关注这行代码if(rem <= 0){ throw new TimeoutException(); }，和while 循环外赋值 long now = Time.monotonicNow();和long rem = et - now;如果while true循环中复制now之前发生full gc，等待时间超过20s，那这里就会抛出TimeoutException异常，追溯到JournalSet.mapJournalsAndReportErrors方法捕获异常，执行terminate操作，导致namenode的直接退出。
解决办法
1、修改源码，直接在源码中进行代码执行时间的判断，去除掉GC时间
2、（推荐）进行jvm调优，增加内存，设置预期最大stw时间 ms，选用G1 gc，添加启动参数
-Xmx4096M -Xms4096M -XX:+UseG1GC -XX:MaxGCPauseMillis=100

问题2

重启30节点，隔6min，重启31，隔6min，重启32

2022-01-12 13:21:29,779 WARN  client.QuorumJournalManager (IPCLoggerChannel.java:call(388)) - Remote journal 10.32.40.30:8485 failed to write txns 2887442538-2887442538. Will try to write to this JN again after the next log roll.
java.io.EOFException: End of File Exception between local host is: "shyt-hadoop-4031.xxx.com.cn/10.32.40.31"; destination host is: "shyt-hadoop-4030.xxx.com.cn":8485; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1558)
        at org.apache.hadoop.ipc.Client.call(Client.java:1498)
        at org.apache.hadoop.ipc.Client.call(Client.java:1398)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at com.sun.proxy.$Proxy11.journal(Unknown Source)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)
        at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:385)
        at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:378)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1119)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1014)
2022-01-12 13:21:29,796 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [10.32.40.32:8485, 10.32.40.30:8485, 10.32.40.31:8485], stream=QuorumOutputStream starting at txid 2887441291))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 1 successful responses:
10.32.40.31:8485: null [success]
2 exceptions thrown:
10.32.40.30:8485: End of File Exception between local host is: "shyt-hadoop-4031.xxx.com.cn/10.32.40.31"; destination host is: "shyt-hadoop-4030.xxx.com.cn":8485; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
10.32.40.32:8485: /export/hadoop/hdfs/journal/xxxdata/current/last-promised-epoch.tmp (No such file or directory)
        at java.io.FileOutputStream.open0(Native Method)
        at java.io.FileOutputStream.open(FileOutputStream.java:270)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
        at org.apache.hadoop.hdfs.util.AtomicFileOutputStream.<init>(AtomicFileOutputStream.java:58)
        at org.apache.hadoop.hdfs.util.PersistentLongFile.writeFile(PersistentLongFile.java:78)
        at org.apache.hadoop.hdfs.util.PersistentLongFile.set(PersistentLongFile.java:64)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.updateLastPromisedEpoch(Journal.java:326)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:434)
        at org.apache.hadoop.hdfs.qjournal.server.Journal.heartbeat(Journal.java:417)
        at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.heartbeat(JournalNodeRpcServer.java:159)
        at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.heartbeat(QJournalProtocolServerSideTranslatorPB.java:172)
        at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25423)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

        at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
        at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:707)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:641)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2721)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2586)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
2022-01-12 13:21:29,796 WARN  client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 2887441291
2022-01-12 13:21:29,814 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1

分析

3节点的journalnode，只有一个响应成功，30重启，连接不上是正常的，
31 节点连接成功
32 节点连接失败，异常
namenode连接journalnode，edit日志push要求必须过半成功，既2个节点返回成功才可以，否则namenode直接terminate exit 挂了

解决

重新初始化32节点journalnode的目录
1、sudo su - hdfs && mkdir -p /export/hadoop/hdfs/journal/xxxdata/current/
2、复制31节点/export/hadoop/hdfs/journal/xxxdata/current/VERSION到32节点
3、重启32节点journalnode服务，完美解决

NameNode Full GC导致集群退出问题剖析，以及重启j

元数据流程图

问题1

总结

问题2

分析

解决

猜你喜欢

热点阅读