Netty内存泄漏问题排查
2019-05-23 本文已影响0人
育王净量
1. 问题描述
手机收到告警 “主机内存不足, 使用率高达 90.31%, 使用了 3.49G”。线上机器查看日志,显示如下:
2019-05-22 13:42:55,241 WARN [nioEventLoopGroup-4-4] i.n.c.AbstractChannelHandlerContext:146 - [] [] [bds=] An exception '{}' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2063597575, max: 2068840448)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:214)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:146)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:137)
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:122)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
2.分析
看日志报错原因,内存池中对象被分配使用后未进行回收,导致不断新增对象直至到达JVM限制的临界值。怀疑有堆外内存泄漏,对象没有释放导致的。用命令看下JVM核心参数的配置如下:jps -l 、jmap -heap 15345
$ jmap -heap 15345
Attaching to process ID 15345, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.144-b01
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 2147483648 (2048.0MB)
NewSize = 786432000 (750.0MB)
MaxNewSize = 786432000 (750.0MB)
OldSize = 1361051648 (1298.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 39845888 (38.0MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 398458880 (380.0MB)
G1HeapRegionSize = 0 (0.0MB)
进一步查找日志,发现有如下日志,“LEAK: ByteBuf.release() was not called before it's garbage-collected.”。怀疑是ByteBuf没有释放导致的堆外内存持续增加,直至无法分配内存。
2019-05-20 14:27:44,760 ERROR [AsyncHttpClient-2-8] i.n.util.ResourceLeakDetector:171 - [] [] [bds=] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
2019-05-20 14:27:44,760 INFO [nioEventLoopGroup-4-8] c.w.o.i.n.CompleteUserInfoHandler:149 - [] [] [bds=] Invoke user center result = CompleteUserInfoHandler.WacUserApiResponse(success=true, errCode=null, errMsg=null, result=1)
使用jmap -histo:live 15345 > a.log 分析目前堆中的对象:
包含对象序号、某个对象示例数、当前对象所占内存的大小、当前对象的全限定名,如下图:
num #instances #bytes class name
----------------------------------------------
1: 11309 152795888 [B
2: 252460 20504272 [J
3: 252367 16151488 io.netty.buffer.PoolSubpage
4: 12253 6155712 [Ljava.lang.Object;
5: 42217 4164088 [C
6: 163 1078064 [Lio.netty.buffer.PoolSubpage;
7: 1560 1023360 io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
8: 41770 1002480 java.lang.String
9: 8835 985352 java.lang.Class
10: 1248 818688 io.netty.util.internal.shaded.org.jctools.queues.MpscArrayQueue
11: 20378 652096 java.util.concurrent.ConcurrentHashMap$Node
12: 5781 508032 [I
13: 11192 358144 java.util.HashMap$Node
14: 22009 352144 java.lang.Object
15: 3316 291808 java.lang.reflect.Method
16: 2852 276560 [Ljava.util.HashMap$Node;
17: 150 267952 [Ljava.util.concurrent.ConcurrentHashMap$Node;
18: 5750 230000 java.util.LinkedHashMap$Entry
19: 81 194320 [Ljava.nio.ByteBuffer;
20: 50 185200 [Ljava.nio.channels.SelectionKey;
21: 2909 162904 java.util.LinkedHashMap
22: 4085 130720 com.wac.metric.bds.sdk.common.loader.BdsClassloader$ByteCode
23: 168 113280 [Lio.netty.util.Recycler$DefaultHandle;
24: 4630 111120 java.util.ArrayList
25: 971 108752 java.net.SocksSocketImpl
26: 2192 87680 java.lang.ref.Finalizer
27: 1448 84752 [Ljava.lang.String;
28: 3266 78384 javassist.bytecode.Utf8Info
29: 1469 70512 java.util.HashMap
30: 40 68480 [Lio.netty.buffer.PoolSubpage;
31: 2056 65792 com.codahale.metrics.WeightedSnapshot$WeightedSample
32: 2 65568 [Lcom.alibaba.fastjson.util.IdentityHashMap$Entry;
33: 60 62400 [Lio.netty.util.Recycler$DefaultHandle;
34: 1479 59160 java.lang.ref.SoftReference
35: 1279 51160 java.util.TreeMap$Entry
36: 2058 49392 java.util.concurrent.ConcurrentSkipListMap$Node
37: 2057 49368 java.lang.Double
38: 2270 48208 [Ljava.lang.Class;
39: 1472 47104 java.util.Hashtable$Entry
40: 967 46416 java.net.SocketInputStream
41: 967 46416 java.net.SocketOutputStream
42: 1440 46080 io.netty.buffer.PoolThreadCache$SubPageMemoryRegionCache
重现问题
通过设置netty的堆外内存检测级别来重现问题。
java -jar -Dio.netty.leakDetection.level=advanced *.jar
解决方案如下,
try {
doHttpRequest(xx);
} finally {
ReferenceCountUtil.release(msg);
}
谁来负责Release?在Netty里,因为Handler链的存在,ByteBuf经常要传递到下一个Hanlder去而不复还,所以规则变成了谁是最后使用者,谁负责释放。另外,更要注意的是各种异常情况,ByteBuf没有成功传递到下一个Hanlder,还在自己地界里的话,一定要进行释放。根据谁最后谁负责原则,每个Handler对消息可能有三种处理方式
- 对原消息不做处理,调用 ctx.fireChannelRead(msg)把原消息往下传,那不用做什么释放。
- 将原消息转化为新的消息并调用 ctx.fireChannelRead(newMsg)往下传,那必须把原消息release掉。
- 如果已经不再调用ctx.fireChannelRead(msg)传递任何消息,那更要把原消息release掉。
假设每一个Handler都把消息往下传,Handler并也不知道谁是启动Netty时所设定的Handler链的最后一员,所以Netty在Handler链的最末补了一个TailHandler,如果此时消息仍然是ReferenceCounted类型就会被release掉。
要发送的消息由应用所创建,并调用 ctx.writeAndFlush(msg) 进入Handler链。在每个Handler中的处理类似InBound Message,最后消息会来到HeadHandler,再经过一轮复杂的调用,在flush完成后终将被release掉。
多层的异常处理机制,有些异常处理的地方不一定准确知道ByteBuf之前释放了没有,可以在释放前加上引用计数大于0的判断避免释放失败;有时候不清楚ByteBuf被引用了多少次,但又必须在此进行彻底的释放,可以循环调用reelase()直到返回true。