java编程语言-JVM系列Java Concurrency

线程池引发的故障到底该怎么排查?

2019-06-27  本文已影响0人  sunshujie1990

前情提要

最近读了一篇关于线程池故障排查的文章,收货颇丰.文章地址如下:

https://mp.weixin.qq.com/s/0U_xmzvmnY354Lgv0IwZ6g

这里简要回顾一下,感兴趣的同学可以仔细读一下这篇文章:

排查过程优化

问题是找到了,但是原文排查过程颇为艰辛.那么有没有更为有效的方法来快速定位线程池打满的问题呢?

可以通过jstack dump线程,然后对线程进行分析,从而快速定位问题!

场景构建

这里构建一个简单的环境复现上述场景:

  1. 构建一个spring boot项目,简单起见使用Tmocat线程池代替案例中的dubbo线程池.项目引入了httpclient, spring boot会自动将RestTemplate配置为httpclient的实现.

      <dependencies>
        <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
    
        <dependency>
          <groupId>org.apache.httpcomponents</groupId>
          <artifactId>httpclient</artifactId>
          <version>4.5.4</version>
        </dependency>
    
        <dependency>
          <groupId>org.springframework.boot</groupId>
          <artifactId>spring-boot-starter-test</artifactId>
          <scope>test</scope>
        </dependency>
      </dependencies>
    

    设置Tomcat工作线程池,最小10个线程,最大20个线程.这里只是为了模拟场景,所以最大线程池设置的比较小.

    server.tomcat.min-spare-threads=10
    server.tomcat.max-threads=20
    
  1. 编写一个http接口模拟案例中的外部接口,这里睡眠3秒模拟接口超时

        @GetMapping("/slowApi")
        public String slowApi() throws InterruptedException {
            TimeUnit.SECONDS.sleep(3);
            return "ok!";
        }
    
  2. 构建一个RestTemplate,连接超时,读取超时设置为2秒

        @Bean
        public RestTemplate restTemplate(RestTemplateBuilder builder) {
            return builder.setConnectTimeout(Duration.ofSeconds(2))
                    .setReadTimeout(Duration.ofSeconds(2)).build();
        }
    
  3. 提供一个接口,使用RestTemplate访问外部接口

        @GetMapping("/getMsg")
        public String getMsg() {
            try {
                String demo = restTemplate.getForObject("http://localhost:8080/slowApi", String.class);
                return ">>>" + demo;
    
            } catch (Exception e) {
                return "failed!";
            }
        }
    
  4. 编写一个简单的测试模拟并发访问

    System.setProperty("http.maxConnections", String.valueOf(THREAD_POOL_SIZE));这里是设置HttpClient最大连接数,否则将达不到我们所期望的并发量.

    public class ConcurrenceTest {
        private static final int TOTAL_REQUEST = 100000;
        private static final int THREAD_POOL_SIZE = 50;
        @Test
        public void test() throws InterruptedException {
            System.setProperty("http.maxConnections", String.valueOf(THREAD_POOL_SIZE));
            RestTemplateBuilder builder = new RestTemplateBuilder();
            RestTemplate restTemplate = builder.build();
            ExecutorService executorService = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
            CountDownLatch countDownLatch = new CountDownLatch(TOTAL_REQUEST);
            for (int i = 0; i < TOTAL_REQUEST; i++) {
                executorService.execute(()->{
                    restTemplate.getForObject("http://localhost:8080/getMsg", String.class);
                    countDownLatch.countDown();
                });
            }
            countDownLatch.await();
            executorService.shutdown();
        }
    }
    

以上代码已经上传至:https://github.com/sunshujie1990/thread-troubleshooting

故障定位

  1. 启动项目
  2. 通过jps -l找到项目的进程号, 如下是16555
ssj@ssj-PC$: jps -l
16555 com.shujie.threadtroubleshooting.ThreadTroubleshootingApplication
  1. 通过jstack dump线程信息,如下会将线程信息dump到一个名为thread.txt的文件中
jstack 16555 > thread.txt
  1. 启动并发测试,通过浏览器访问http://localhost:8080/getMsg一直等待,说明Tomcat线程池已经被打满了,无法对外提供访问.
  2. 再次dump线程信息到另外一个文件
jstack 16555 > thread2.txt
  1. 分析dump线程信息

首先打开thread.txt,看下健康的Tomcat线程池是什么样子的:

"http-nio-8080-exec-10" #34 daemon prio=5 os_prio=0 tid=0x00007fa500bf9000 nid=0x40ea waiting on condition [0x00007fa4e02dc000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007844d1038> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:107)
    at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:33)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)

然后打开thread2.txt,再来看一下线程池被打满之后的线程信息

"http-nio-8080-exec-20" #54 daemon prio=5 os_prio=0 tid=0x00007fa468011800 nid=0x421c waiting on condition [0x00007fa4cd6ff000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007067027e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:379)
    at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
    at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
    - locked <0x00000007824713a0> (a org.apache.http.pool.AbstractConnPool$2)
    at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:735)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:670)
    at org.springframework.web.client.RestTemplate.getForObject(RestTemplate.java:311)
    at com.shujie.threadtroubleshooting.ThreadTroubleshootingApplication.getMsg(ThreadTroubleshootingApplication.java:35)

至此排查过程结束.

总结

线程池被打满,无法对外提供服务是非常严重的故障.产生的原因也是多种多样,不仅仅限于本文提到的这个场景.本文希望能够通过这个简单的例子帮助大家在遇到类似的问题时快速定位BUG,而不至于一脸懵逼.

上一篇 下一篇

猜你喜欢

热点阅读