Netty内存泄漏问题排查

2019-05-23  本文已影响0人  育王净量

1. 问题描述

手机收到告警 “主机内存不足, 使用率高达 90.31%, 使用了 3.49G”。线上机器查看日志,显示如下:


2019-05-22 13:42:55,241 WARN  [nioEventLoopGroup-4-4] i.n.c.AbstractChannelHandlerContext:146 - [] [] [bds=] An exception '{}' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2063597575, max: 2068840448)
    at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640)
    at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594)
    at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764)
    at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740)
    at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:214)
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:146)
    at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
    at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
    at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:122)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)


2.分析

看日志报错原因,内存池中对象被分配使用后未进行回收,导致不断新增对象直至到达JVM限制的临界值。怀疑有堆外内存泄漏,对象没有释放导致的。用命令看下JVM核心参数的配置如下:jps -l 、jmap -heap 15345


$ jmap -heap 15345
Attaching to process ID 15345, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 2147483648 (2048.0MB)
   NewSize                  = 786432000 (750.0MB)
   MaxNewSize               = 786432000 (750.0MB)
   OldSize                  = 1361051648 (1298.0MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 39845888 (38.0MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 398458880 (380.0MB)
   G1HeapRegionSize         = 0 (0.0MB)


进一步查找日志,发现有如下日志,“LEAK: ByteBuf.release() was not called before it's garbage-collected.”。怀疑是ByteBuf没有释放导致的堆外内存持续增加,直至无法分配内存。


2019-05-20 14:27:44,760 ERROR [AsyncHttpClient-2-8] i.n.util.ResourceLeakDetector:171 - [] [] [bds=] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
2019-05-20 14:27:44,760 INFO  [nioEventLoopGroup-4-8] c.w.o.i.n.CompleteUserInfoHandler:149 - [] [] [bds=] Invoke user center result = CompleteUserInfoHandler.WacUserApiResponse(success=true, errCode=null, errMsg=null, result=1)

使用jmap -histo:live 15345 > a.log 分析目前堆中的对象:
包含对象序号、某个对象示例数、当前对象所占内存的大小、当前对象的全限定名,如下图:


 num     #instances         #bytes  class name
----------------------------------------------
   1:         11309      152795888  [B
   2:        252460       20504272  [J
   3:        252367       16151488  io.netty.buffer.PoolSubpage
   4:         12253        6155712  [Ljava.lang.Object;
   5:         42217        4164088  [C
   6:           163        1078064  [Lio.netty.buffer.PoolSubpage;
   7:          1560        1023360  io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
   8:         41770        1002480  java.lang.String
   9:          8835         985352  java.lang.Class
  10:          1248         818688  io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
  11:         20378         652096  java.util.concurrent.ConcurrentHashMap$Node
  12:          5781         508032  [I
  13:         11192         358144  java.util.HashMap$Node
  14:         22009         352144  java.lang.Object
  15:          3316         291808  java.lang.reflect.Method
  16:          2852         276560  [Ljava.util.HashMap$Node;
  17:           150         267952  [Ljava.util.concurrent.ConcurrentHashMap$Node;
  18:          5750         230000  java.util.LinkedHashMap$Entry
  19:            81         194320  [Ljava.nio.ByteBuffer;
  20:            50         185200  [Ljava.nio.channels.SelectionKey;
  21:          2909         162904  java.util.LinkedHashMap
  22:          4085         130720  com.wac.metric.bds.sdk.common.loader.BdsClassloader$ByteCode
  23:           168         113280  [Lio.netty.util.Recycler$DefaultHandle;
  24:          4630         111120  java.util.ArrayList
  25:           971         108752  java.net.SocksSocketImpl
  26:          2192          87680  java.lang.ref.Finalizer
  27:          1448          84752  [Ljava.lang.String;
  28:          3266          78384  javassist.bytecode.Utf8Info
  29:          1469          70512  java.util.HashMap
  30:            40          68480  [Lio.netty.buffer.PoolSubpage;
  31:          2056          65792  com.codahale.metrics.WeightedSnapshot$WeightedSample
  32:             2          65568  [Lcom.alibaba.fastjson.util.IdentityHashMap$Entry;
  33:            60          62400  [Lio.netty.util.Recycler$DefaultHandle;
  34:          1479          59160  java.lang.ref.SoftReference
  35:          1279          51160  java.util.TreeMap$Entry
  36:          2058          49392  java.util.concurrent.ConcurrentSkipListMap$Node
  37:          2057          49368  java.lang.Double
  38:          2270          48208  [Ljava.lang.Class;
  39:          1472          47104  java.util.Hashtable$Entry
  40:           967          46416  java.net.SocketInputStream
  41:           967          46416  java.net.SocketOutputStream
  42:          1440          46080  io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache

重现问题

通过设置netty的堆外内存检测级别来重现问题。

java -jar -Dio.netty.leakDetection.level=advanced *.jar

解决方案如下,

        try {
            doHttpRequest(xx);
        } finally {
            ReferenceCountUtil.release(msg);
        }

谁来负责Release?在Netty里,因为Handler链的存在,ByteBuf经常要传递到下一个Hanlder去而不复还,所以规则变成了谁是最后使用者,谁负责释放。另外,更要注意的是各种异常情况,ByteBuf没有成功传递到下一个Hanlder,还在自己地界里的话,一定要进行释放。根据谁最后谁负责原则,每个Handler对消息可能有三种处理方式

参考文档

上一篇 下一篇

猜你喜欢

热点阅读