记一次tomcat假死问题排查
2019-11-25 本文已影响0人
猫清扬
发现问题是因为tomcat http接口都访问不了了,但是tomcat进程还在,并且日志不再滚动了。
- 首先怀疑是内存泄漏,查看errorlog看有没有OutOfMemoryException 异常
vim biz.error.log
发现没有OutOfMemoryException异常
- 那么应该是tomcat的线程出了问题,需要查看一下当前进程下线程执行状态
jstack pid > jstack_pid.log
输出线程的执行状态到日志文件
把jstack_pid.log文件下载下来用记事本打开,发现基本上所有的线程都是WAITING状态,只有GC线程是RUNNABLE
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode):
"Attach Listener" #1023 daemon prio=9 os_prio=0 tid=0x00007f8bfc014800 nid=0x2042 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"NettyClientPublicExecutor_4" #1020 daemon prio=5 os_prio=0 tid=0x0000000002363000 nid=0x1d3a waiting on condition [0x00007f8bc883d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000ee136860> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
....
"VM Thread" os_prio=0 tid=0x00007f8c4010b800 nid=0x15f4 runnable
"Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c4001b800 nid=0x15ef runnable
"Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c4001d800 nid=0x15f0 runnable
"Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c4001f000 nid=0x15f1 runnable
"Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c40021000 nid=0x15f2 runnable
"Concurrent Mark-Sweep GC Thread" os_prio=0 tid=0x00007f8c40064000 nid=0x15f3 runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f8c40165800 nid=0x15fe waiting on condition
那么可以推测原因应该是GC回收不了大对象照成的
- 查看GC运行状态
jstat -gcutil pid 1000
打印出GC运行状态(每秒刷新一次)
S0 S1 E O M CCS YGC YGCT FGC FGCT GCT
0.00 0.00 91.34 100.00 85.94 82.35 7830 32.715 43582 25846.169 25878.884
0.00 0.00 91.34 100.00 85.94 82.35 7830 32.715 43582 25846.169 25878.884
0.00 0.00 92.98 100.00 85.94 82.35 7830 32.715 43582 25846.169 25878.884
0.00 0.00 94.03 100.00 85.94 82.35 7830 32.715 43582 25846.169 25878.884
0.00 0.00 94.87 100.00 85.94 82.35 7830 32.715 43582 25846.169 25878.884
0.00 0.00 95.50 100.00 85.94 82.35 7830 32.715 43582 25846.169 25878.884
果然O(老年代)已经满了,GC把所有线程资源都占了导致其他线程无法运行照成了假死状态
- 分析heap内存状态,找出具体代码问题
jmap -dump:format=b,file=./heap.hprof [pid]
生成heap文件
使用Memory Analyzer来分析
5dd4ba3fc91af.png
分析出来是订单的定时任务造成GC的问题
Inked5dd4babe7e7d1_LI.jpg
接下来去改代码就可以了。主要的方法就是先去看日志,日志找不到原因就去线程的堆栈信息里面找原因。
当然除了以上的办法还有几个有用的命令能更快地定位到线程问题
top -Hp pid
查看进程下线程的cpu和内存状态
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25246 tomcat 20 0 4435752 986.7m 7368 S 70.3 40.6 0:21.32 java //这个线程占了很高的cpu和内存
25243 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.00 java
25244 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.88 java
25245 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:20.04 java
25247 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:19.99 java
25248 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:19.98 java
25249 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 1:36.60 java
25250 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 1:47.24 java
25251 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.04 java
25252 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.04 java
25253 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.00 java
25254 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.00 java
25255 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.44 java
25256 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.07 java
25258 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:00.59 java
25259 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 1:05.06 java
25260 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 0:59.96 java
25261 tomcat 20 0 4435752 986.7m 7368 S 0.0 12.6 1:06.67 java
25336 tomcat 20 0 4435752 986.7m 7368 S 0.7 12.6 0:42.12 java
ps -mp pid -o THREAD,tid,time
查看当前进程下所有线程的耗时
USER %CPU PRI SCNT WCHAN USER SYSTEM TID TIME
tomcat 9.6 - - - - - - 00:28:07
tomcat 0.0 19 - futex_ - - 25243 00:00:00
tomcat 0.0 19 - poll_s - - 25244 00:00:00
tomcat 0.1 19 - futex_ - - 25245 00:00:20
tomcat 0.1 19 - futex_ - - 25246 01:00:21 //这个线程执行了一个多小时,很可能有问题
tomcat 0.1 19 - futex_ - - 25247 00:00:19
tomcat 0.1 19 - futex_ - - 25248 00:00:19
tomcat 0.5 19 - futex_ - - 25249 00:01:36
tomcat 0.6 19 - futex_ - - 25250 00:01:47
tomcat 0.0 19 - futex_ - - 25251 00:00:00
tomcat 0.0 19 - futex_ - - 25252 00:00:00
tomcat 0.0 19 - futex_ - - 25253 00:00:00
tomcat 0.0 19 - futex_ - - 25254 00:00:00
tomcat 0.0 19 - futex_ - - 25255 00:00:00
tomcat 0.0 19 - futex_ - - 25256 00:00:00
tomcat 0.0 19 - ep_pol - - 25258 00:00:00
tomcat 0.3 19 - futex_ - - 25259 00:01:05
tomcat 0.3 19 - futex_ - - 25260 00:00:59
printf “%x\n” 25246
转换线程id为16进制格式 629en
jstack pid |grep 629en -A 30
输出该线程的30行堆栈信息
"http-nio2-8089-exec-195" #344 daemon prio=5 os_prio=0 tid=0x00007fa8980de000 nid=0x6265 runnable [0x00007fa85e6d4000]
java.lang.Thread.State: RUNNABLE
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at org.apache.shiro.web.util.WebUtils.normalize(WebUtils.java:194)
at org.apache.shiro.web.util.WebUtils.normalize(WebUtils.java:155)
at org.apache.shiro.web.util.WebUtils.getRequestUri(WebUtils.java:140)
at org.apache.shiro.web.util.WebUtils.getPathWithinApplication(WebUtils.java:112)
at org.apache.shiro.web.filter.mgt.PathMatchingFilterChainResolver.getPathWithinApplication(PathMatchingFilterChainResolver.java:147)
at org.apache.shiro.web.filter.mgt.PathMatchingFilterChainResolver.getChain(PathMatchingFilterChainResolver.java:99)
at org.apache.shiro.web.servlet.AbstractShiroFilter.getExecutionChain(AbstractShiroFilter.java:415)
at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:448)
at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at com.github.common.component.biz.Biz.doBizMethod(Biz.java:30)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.boot.actuate.trace.WebRequestTraceFilter.doFilterInternal(WebRequestTraceFilter.java:111)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
...