分析gc日志并进行调优
在我们正常的开发中很少会接触gc调优的事情。gc调优一般都是发生在项目开始之前。而项目开始之后如果发生oom或者full gc时间太长,导致系统卡顿等问题,这时候最好先找自己代码的问题。
这段的意思就是打印gc的详情,输出gc时间戳,使用ParNew+CMS+SerialOld进行组合回收,当CMS失败时,SerialOld作为后备回收器。将gc日志输出到指定目录中。
然后启动项目。获取gc日志的信息。截取其中的一段来进行分析:
CommandLine flags: -XX:-BytecodeVerificationLocal -XX:-BytecodeVerificationRemote -XX:InitialHeapSize=134217728 -XX:+ManagementServer -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:TieredStopAtLevel=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC
0.992: [GC (Allocation Failure) [PSYoungGen: 33280K->3941K(38400K)] 33280K->4013K(125952K), 0.0046700 secs] [Times: user=0.01 sys=0.01, real=0.00 secs]
1.239: [GC (Allocation Failure) [PSYoungGen: 37221K->4387K(38400K)] 37293K->4467K(125952K), 0.0060482 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
1.615: [GC (Allocation Failure) [PSYoungGen: 37667K->4719K(38400K)] 37747K->4807K(125952K), 0.0045716 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
1.758: [GC (Allocation Failure) [PSYoungGen: 37999K->5101K(71680K)] 38087K->5452K(159232K), 0.0066052 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2.040: [GC (Allocation Failure) [PSYoungGen: 71661K->5106K(71680K)] 72012K->6515K(159232K), 0.0089319 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2.216: [GC (Allocation Failure) [PSYoungGen: 71666K->6582K(139264K)] 73075K->7999K(226816K), 0.0084832 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2.446: [GC (Metadata GC Threshold) [PSYoungGen: 69050K->5044K(139776K)] 70468K->9525K(227328K), 0.0104451 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
2.457: [Full GC (Metadata GC Threshold) [PSYoungGen: 5044K->0K(139776K)] [ParOldGen: 4481K->7961K(52736K)] 9525K->7961K(192512K), [Metaspace: 20638K->20638K(1067008K)], 0.0291630 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
接下来挑几行日志来分析一下各个参数到底是什么意思。
1.CommandLine flags指的是java的启动参数。
- 然后第一段gc的详情日志:0.992: [GC (Allocation Failure) [PSYoungGen: 33280K->3941K(38400K)] 33280K->4013K(125952K), 0.0046700 secs] [Times: user=0.01 sys=0.01, real=0.00 secs]
Allocation Failure 指的是内存分配失败,所以进行gc。
PSYoungGen指的是进行了新生代的gc,内存从33280K到了3941K,新生代总内存为38400K。整个堆的大小从33280K到了4013K,堆的总内存为125952K。总共耗时4毫秒。
3.然后分析发生full gc的那一行日志:2.457: [Full GC (Metadata GC Threshold) [PSYoungGen: 5044K->0K(139776K)] [ParOldGen: 4481K->7961K(52736K)] 9525K->7961K(192512K), [Metaspace: 20638K->20638K(1067008K)], 0.0291630 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
发生了full gc,导致full gc的原因是metadata 分配,其他参数跟上面参数一样,Metaspace这个参数表示发生gc前metadata大小为20638K,gc后metadata的大小为20638K,总大小为1067008K。
可以将gc日志上传至https://gceasy.io这个网站上,会给你生成可视化的gc的情况。类似如下页面:
注意,红框标记的意思是吞吐量的意思,吞吐量越高,gc的性能越好。
然后我们来进行调优一下,调优一般就是改动一些参数,如metadata的大小,新生代的大小等等。一个一个试。上述的gc日志,表明了metadata区可能太小了,因为我们如果不指定metadata的区域的大小的话,他的最小值是我们机器内存的1/64,最大值是我们机器的1/4。我们先给他指定一下大小,加如下参数 -XX:MetaspaceSize=1g
-XX:MaxMetaspaceSize=1g
然后再分析一下gc的日志:
图片.png
可以看到吞吐量提高了,并且平均gc响应跟最大gc响应都变小了。
然后我们再指定一下用cms收集器看看:加这个参数-XX:+UseConcMarkSweepGC
6.228: [GC (CMS Initial Mark) [1 CMS-initial-mark: 44081K(87424K)] 47207K(126720K), 0.0009259 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
6.229: [CMS-concurrent-mark-start]
6.278: [CMS-concurrent-mark: 0.050/0.050 secs] [Times: user=0.09 sys=0.01, real=0.05 secs]
6.278: [GC (Allocation Failure) 6.278: [ParNew: 37354K->2663K(39296K), 0.0025064 secs] 81435K->46744K(126720K), 0.0025354 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
6.281: [CMS-concurrent-preclean-start]
6.282: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.01, real=0.00 secs]
6.282: [GC (CMS Final Remark) [YG occupancy: 3362 K (39296 K)]6.282: [Rescan (parallel) , 0.0012354 secs]6.283: [weak refs processing, 0.0003721 secs]6.284: [class unloading, 0.0036852 secs]6.287: [scrub symbol table, 0.0087614 secs]6.296: [scrub string table, 0.0004252 secs][1 CMS-remark: 44081K(87424K)] 47443K(126720K), 0.0150513 secs] [Times: user=0.02 sys=0.00, real=0.01 secs]
6.297: [CMS-concurrent-sweep-start]
6.312: [CMS-concurrent-sweep: 0.014/0.014 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
6.312: [CMS-concurrent-reset-start]
6.314: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
可以看到cms收集器的各个阶段,初始标记,并发标记,最终标记,然后清除等过程。
然后看下当前的gc的效果图:
图片.png
可以看到吞吐量达到了99.208%。比之前高了1.5个百分点。
再试试使用g1收集器是怎么样的:-XX:+UseG1GC
g1的gc日志跟其他的日志有很大的不同:
0.729: [GC pause (G1 Evacuation Pause) (young), 0.0044414 secs]
[Parallel Time: 2.4 ms, GC Workers: 4]
[GC Worker Start (ms): Min: 729.7, Avg: 729.7, Max: 729.7, Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.8, Diff: 0.6, Sum: 1.5]
[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
[Object Copy (ms): Min: 1.4, Avg: 1.8, Max: 2.1, Diff: 0.6, Sum: 7.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.0, Sum: 9.0]
[GC Worker End (ms): Min: 732.0, Avg: 732.0, Max: 732.0, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 1.9 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 1.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.0 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.0 ms]
[Eden: 14.0M(14.0M)->0.0B(19.0M) Survivors: 0.0B->2048.0K Heap: 14.0M(128.0M)->2650.0K(128.0M)]
[Times: user=0.01 sys=0.01, real=0.00 secs]
使用g1后发现吞吐量下降了,但是平均gc时间跟最大gc时间都降低了很多。因为g1是低延迟垃圾回收器,不是以高吞吐量为目的的。
图片.png
当然真正线上的参数肯定比这里例举的多很多,不过gc调优的大概思路就是这样,根据分析结果,调整各个参数。