记一次tomcat假死问题排查

2019-11-25  本文已影响0人  猫清扬

发现问题是因为tomcat http接口都访问不了了,但是tomcat进程还在,并且日志不再滚动了。

  1. 首先怀疑是内存泄漏,查看errorlog看有没有OutOfMemoryException 异常

vim biz.error.log发现没有OutOfMemoryException异常

  1. 那么应该是tomcat的线程出了问题,需要查看一下当前进程下线程执行状态

jstack pid > jstack_pid.log输出线程的执行状态到日志文件

把jstack_pid.log文件下载下来用记事本打开,发现基本上所有的线程都是WAITING状态,只有GC线程是RUNNABLE

Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode):

"Attach Listener" #1023 daemon prio=9 os_prio=0 tid=0x00007f8bfc014800 nid=0x2042 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"NettyClientPublicExecutor_4" #1020 daemon prio=5 os_prio=0 tid=0x0000000002363000 nid=0x1d3a waiting on condition [0x00007f8bc883d000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000ee136860> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

....

"VM Thread" os_prio=0 tid=0x00007f8c4010b800 nid=0x15f4 runnable 

"Gang worker#0 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c4001b800 nid=0x15ef runnable 

"Gang worker#1 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c4001d800 nid=0x15f0 runnable 

"Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c4001f000 nid=0x15f1 runnable 

"Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f8c40021000 nid=0x15f2 runnable 

"Concurrent Mark-Sweep GC Thread" os_prio=0 tid=0x00007f8c40064000 nid=0x15f3 runnable 

"VM Periodic Task Thread" os_prio=0 tid=0x00007f8c40165800 nid=0x15fe waiting on condition 

那么可以推测原因应该是GC回收不了大对象照成的

  1. 查看GC运行状态

jstat -gcutil pid 1000打印出GC运行状态(每秒刷新一次)

  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
  0.00   0.00  91.34 100.00  85.94  82.35   7830   32.715 43582 25846.169 25878.884
  0.00   0.00  91.34 100.00  85.94  82.35   7830   32.715 43582 25846.169 25878.884
  0.00   0.00  92.98 100.00  85.94  82.35   7830   32.715 43582 25846.169 25878.884
  0.00   0.00  94.03 100.00  85.94  82.35   7830   32.715 43582 25846.169 25878.884
  0.00   0.00  94.87 100.00  85.94  82.35   7830   32.715 43582 25846.169 25878.884
  0.00   0.00  95.50 100.00  85.94  82.35   7830   32.715 43582 25846.169 25878.884

果然O(老年代)已经满了,GC把所有线程资源都占了导致其他线程无法运行照成了假死状态

  1. 分析heap内存状态,找出具体代码问题
    jmap -dump:format=b,file=./heap.hprof [pid]生成heap文件

使用Memory Analyzer来分析


5dd4ba3fc91af.png

分析出来是订单的定时任务造成GC的问题


Inked5dd4babe7e7d1_LI.jpg

接下来去改代码就可以了。主要的方法就是先去看日志,日志找不到原因就去线程的堆栈信息里面找原因。


当然除了以上的办法还有几个有用的命令能更快地定位到线程问题

top -Hp pid 查看进程下线程的cpu和内存状态


 PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND        
25246 tomcat    20   0 4435752 986.7m   7368 S  70.3 40.6   0:21.32 java      //这个线程占了很高的cpu和内存                                                                                                                                                                                                                                                                                                                                                            
25243 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.00 java                                                                                                                                                                                     
25244 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.88 java                                                                                                                                                                                     
25245 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:20.04 java                                                                                                                                                                                                                                                                                                                                                             
25247 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:19.99 java                                                                                                                                                                                     
25248 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:19.98 java                                                                                                                                                                                     
25249 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   1:36.60 java                                                                                                                                                                                     
25250 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   1:47.24 java                                                                                                                                                                                     
25251 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.04 java                                                                                                                                                                                     
25252 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.04 java                                                                                                                                                                                     
25253 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.00 java                                                                                                                                                                                     
25254 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.00 java                                                                                                                                                                                     
25255 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.44 java                                                                                                                                                                                     
25256 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.07 java                                                                                                                                                                                     
25258 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:00.59 java                                                                                                                                                                                     
25259 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   1:05.06 java                                                                                                                                                                                     
25260 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   0:59.96 java                                                                                                                                                                                     
25261 tomcat    20   0 4435752 986.7m   7368 S  0.0 12.6   1:06.67 java
25336 tomcat    20   0 4435752 986.7m   7368 S  0.7 12.6   0:42.12 java 

ps -mp pid -o THREAD,tid,time查看当前进程下所有线程的耗时

USER     %CPU PRI SCNT WCHAN  USER SYSTEM   TID     TIME
tomcat    9.6   -    - -         -      -     - 00:28:07
tomcat    0.0  19    - futex_    -      - 25243 00:00:00
tomcat    0.0  19    - poll_s    -      - 25244 00:00:00
tomcat    0.1  19    - futex_    -      - 25245 00:00:20
tomcat    0.1  19    - futex_    -      - 25246 01:00:21   //这个线程执行了一个多小时,很可能有问题
tomcat    0.1  19    - futex_    -      - 25247 00:00:19
tomcat    0.1  19    - futex_    -      - 25248 00:00:19
tomcat    0.5  19    - futex_    -      - 25249 00:01:36
tomcat    0.6  19    - futex_    -      - 25250 00:01:47
tomcat    0.0  19    - futex_    -      - 25251 00:00:00
tomcat    0.0  19    - futex_    -      - 25252 00:00:00
tomcat    0.0  19    - futex_    -      - 25253 00:00:00
tomcat    0.0  19    - futex_    -      - 25254 00:00:00
tomcat    0.0  19    - futex_    -      - 25255 00:00:00
tomcat    0.0  19    - futex_    -      - 25256 00:00:00
tomcat    0.0  19    - ep_pol    -      - 25258 00:00:00
tomcat    0.3  19    - futex_    -      - 25259 00:01:05
tomcat    0.3  19    - futex_    -      - 25260 00:00:59

printf “%x\n” 25246转换线程id为16进制格式 629en
jstack pid |grep 629en -A 30 输出该线程的30行堆栈信息

"http-nio2-8089-exec-195" #344 daemon prio=5 os_prio=0 tid=0x00007fa8980de000 nid=0x6265 runnable [0x00007fa85e6d4000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
        at java.lang.StringBuilder.append(StringBuilder.java:136)
        at org.apache.shiro.web.util.WebUtils.normalize(WebUtils.java:194)
        at org.apache.shiro.web.util.WebUtils.normalize(WebUtils.java:155)
        at org.apache.shiro.web.util.WebUtils.getRequestUri(WebUtils.java:140)
        at org.apache.shiro.web.util.WebUtils.getPathWithinApplication(WebUtils.java:112)
        at org.apache.shiro.web.filter.mgt.PathMatchingFilterChainResolver.getPathWithinApplication(PathMatchingFilterChainResolver.java:147)
        at org.apache.shiro.web.filter.mgt.PathMatchingFilterChainResolver.getChain(PathMatchingFilterChainResolver.java:99)
        at org.apache.shiro.web.servlet.AbstractShiroFilter.getExecutionChain(AbstractShiroFilter.java:415)
        at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:448)
        at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
        at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
        at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
        at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
        at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
        at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at com.github.common.component.biz.Biz.doBizMethod(Biz.java:30)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
        at org.springframework.boot.actuate.trace.WebRequestTraceFilter.doFilterInternal(WebRequestTraceFilter.java:111)
        at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
...
上一篇下一篇

猜你喜欢

热点阅读