Java 理解 ThreadPoolExecutor 实现原理

2019-05-30 本文已影响6人殷天文

使用线程池（ThreadPoolExecutor）的好处是减少在创建和销毁线程上所花的时间以及系统资源的开销，解决资源不足的问题。如果不使用线程池，有可能造成系统创建大量同类线程而导致消耗完内存或者“过度切换”的问题。 -- 阿里Java开发手册

多线程和线程池的重要性，大家都晓得，废话不多说直接进入主题

版本

JDK 1.8

本节目标

理解线程池核心参数
理解线程池工作原理
理解线程池核心方法

线程池的核心参数和构造方法

ctl

    // 线程池核心变量，包含线程池的运行状态和有效线程数，利用二进制的位掩码实现
    private final AtomicInteger ctl = new AtomicInteger(ctlOf(RUNNING, 0));
    private static final int COUNT_BITS = Integer.SIZE - 3;
    private static final int CAPACITY   = (1 << COUNT_BITS) - 1;

    // runState is stored in the high-order bits
    // 线程池状态
    private static final int RUNNING    = -1 << COUNT_BITS;
    private static final int SHUTDOWN   =  0 << COUNT_BITS;
    private static final int STOP       =  1 << COUNT_BITS;
    private static final int TIDYING    =  2 << COUNT_BITS;
    private static final int TERMINATED =  3 << COUNT_BITS;

    // Packing and unpacking ctl
    // 获取当前线程池运行状态
    private static int runStateOf(int c)     { return c & ~CAPACITY; }
    // 获取当前线程池有效线程数
    private static int workerCountOf(int c)  { return c & CAPACITY; }
    // 打包ctl变量
    private static int ctlOf(int rs, int wc) { return rs | wc; }

    /*
     * Bit field accessors that don't require unpacking ctl.
     * These depend on the bit layout and on workerCount being never negative.
     */

    private static boolean runStateLessThan(int c, int s) {
        return c < s;
    }

    private static boolean runStateAtLeast(int c, int s) {
        return c >= s;
    }

    private static boolean isRunning(int c) {
        return c < SHUTDOWN;
    }

JDK7 以后，线程池的状态和有效线程数通过 ctl 这个变量表示（使用二进制的位掩码来实现，这里我们不深究），理解上述几个方法作用即可，不影响下面的源码阅读

关于线程池的五种状态

RUNNING：接受新任务并处理队列中的任务
SHUTDOWN ：不接受新任务，但处理队列中的任务
STOP ：不接受新任务，不处理队列中的任务，并中断正在进行的任务（中断并不是强制的，只是修改了Thread的状态，是否中断取决于Runnable 的实现逻辑）
TIDYING ：所有任务都已终止，workerCount为0时，线程池会过度到该状态，并即将调用 terminate()
TERMINATED ：terminated() 调用完成；线程池中止

线程池状态的转换

RUNNING => SHUTDOWN ：调用 shutdown()
RUNNING / SHUTDOWN => STOP ：调用 shutdownNow() （该方法会返回队列中未执行的任务）
SHUTDOWN => TIDYING：当线程池和队列都为空时
STOP => TIDYING：当线程池为空时
TIDYING => TERMINATED：当 terminated() 调用完成时

构造方法

线程池最终都是调用如下构造方法

public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue<Runnable> workQueue,
                          ThreadFactory threadFactory,
                          RejectedExecutionHandler handler) {
  // 省略
}

核心参数

我们来看一下线程池中的核心参数都是什么作用

private final BlockingQueue<Runnable> workQueue; // 阻塞队列，用于缓存任务

private final ReentrantLock mainLock = new ReentrantLock(); // 线程池主锁

private final HashSet<Worker> workers = new HashSet<Worker>(); // 工作线程集合

private final Condition termination = mainLock.newCondition(); // awaitTermination() 方法的等待条件

private int largestPoolSize; // 记录最大线程池大小

private long completedTaskCount; //用来记录线程池中曾经出现过的最大线程数

private volatile ThreadFactory threadFactory; // 线程工厂，用于创建线程

private volatile RejectedExecutionHandler handler; // 任务拒绝时的策略

private volatile long keepAliveTime; // 线程存活时间
                                     // 当线程数超过核心池数时，或允许核心池线程超时，该参数会起作用。否则一直会等待新的任务

private volatile boolean allowCoreThreadTimeOut; // 是否允许核心池线程超时

private volatile int corePoolSize; // 核心线程池数量

private volatile int maximumPoolSize; // 最大线程池数量

关于 handler

ThreadPoolExecutor.AbortPolicy:丢弃任务并抛出RejectedExecutionException异常。
ThreadPoolExecutor.DiscardPolicy：也是丢弃任务，但是不抛出异常。
ThreadPoolExecutor.DiscardOldestPolicy：丢弃队列最前面的任务，然后重新尝试执行任务（重复此过程）
ThreadPoolExecutor.CallerRunsPolicy：当前任务自己决定

下面我们来举个栗子来更好的理解一下线程池

理解线程池工作原理

假如有一个工厂，工厂里面有10个工人，每个工人同时只能做一件任务。

因此只要当10个工人中有工人是空闲的，来了任务就分配给空闲的工人做；

当10个工人都有任务在做时，如果还来了任务，就把任务进行排队等待；

每个工人做完自己的任务后，会去任务队列中领取新的任务；

如果说新任务数目增长的速度远远大于工人做任务的速度（任务累积过多时），那么此时工厂主管可能会想补救措施，比如重新招4个临时工人进来；

然后就将任务也分配给这4个临时工人做；

如果说着14个工人做任务的速度还是不够，此时工厂主管可能就要考虑不再接收新的任务或者抛弃前面的一些任务了。

当这14个工人当中有人空闲时，而新任务增长的速度又比较缓慢，工厂主管可能就考虑辞掉4个临时工了，只保持原来的10个工人，毕竟请额外的工人是要花钱的。

开始工厂的10个工人，就是 corePoolSize (核心池数量)；
当10个人都在工作时 (核心池达到 corePoolSize)，任务排队等待时，会缓存到 workQueue 中；
当任务累积过多时(达到 workQueue 最大值时)，找临时工；
14个临时工，就是 maximumPoolSize (数量)；
如果此时工作速度还是不够，线程池这时会考虑拒绝任务，具体由拒绝策略决定

理解线程池核心方法

execute()

线程池中所有执行任务的方法有关的方法，都会调用 execute()。所以理解这个方法很重要

public void execute(Runnable command) {
    if (command == null)
        throw new NullPointerException();
    /*
     * Proceed in 3 steps:
     *
     * 1. If fewer than corePoolSize threads are running, try to
     * start a new thread with the given command as its first
     * task.  The call to addWorker atomically checks runState and
     * workerCount, and so prevents false alarms that would add
     * threads when it shouldn't, by returning false.
     *
     * 2. If a task can be successfully queued, then we still need
     * to double-check whether we should have added a thread
     * (because existing ones died since last checking) or that
     * the pool shut down since entry into this method. So we
     * recheck state and if necessary roll back the enqueuing if
     * stopped, or start a new thread if there are none.
     *
     * 3. If we cannot queue task, then we try to add a new
     * thread.  If it fails, we know we are shut down or saturated
     * and so reject the task.
     */
    int c = ctl.get();
    if (workerCountOf(c) < corePoolSize) {
        if (addWorker(command, true))
            return;
        c = ctl.get();
    }
    if (isRunning(c) && workQueue.offer(command)) {
        int recheck = ctl.get();
        if (! isRunning(recheck) && remove(command))
            reject(command);
        else if (workerCountOf(recheck) == 0)
            addWorker(null, false);
    }
    else if (!addWorker(command, false))
        reject(command);
}

分析execute()

step 1

1）首先检查当前有效线程数 是否小于 核心池数量
if (workerCountOf(c) < corePoolSize)

2）如果满足上述条件，则尝试向核心池添加一个工作线程（addWorker() 第二个参数决定了是添加核心池，还是最大池）
if (addWorker(command, true))

3）如果成功则退出方法，否则将执行 step2

step 2

1）如果当前线程池处于运行状态 && 尝试向缓冲队列添加任务
if (isRunning(c) && workQueue.offer(command))

2）如果线程池正在运行并且缓冲队列添加任务成功，进行 double check（再次检查）

3）如果此时线程池非运行状态 => 移除队列 => 拒绝当前任务，退出方法
（这么做是为了，当线程池不可用时及时回滚）

if (! isRunning(recheck) && remove(command))
    reject(command);

4）如果当前有效线程数为0，则创建一个无任务的工作线程（此时这个线程会去队列中获取任务），保证提交到线程池的任务一定被执行

step 3

1）当无法无法向核心池和队列中添加任务时，线程池会再尝试向最大池中添加一个工作线程，如果失败则拒绝该任务

else if (!addWorker(command, false))
         reject(command);

图解execute()

根据上述的步骤画了如下的这个图，希望能帮助大家更好的理解

image.png

addWorker()

在分析execute() 方法时，我们已经知道了 addWorker() 的作用了，可以向核心池或者最大池添加一个工作线程。我们来看一下这个方法都做了什么

private boolean addWorker(Runnable firstTask, boolean core) {
    retry:
    for (;;) {
        int c = ctl.get();
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN &&
            ! (rs == SHUTDOWN &&
               firstTask == null &&
               ! workQueue.isEmpty()))
            return false;

        for (;;) {
            int wc = workerCountOf(c);
            if (wc >= CAPACITY ||
                wc >= (core ? corePoolSize : maximumPoolSize))
                return false;
            if (compareAndIncrementWorkerCount(c))
                break retry;
            c = ctl.get();  // Re-read ctl
            if (runStateOf(c) != rs)
                continue retry;
            // else CAS failed due to workerCount change; retry inner loop
        }
    }

    boolean workerStarted = false;
    boolean workerAdded = false;
    Worker w = null;
    try {
        w = new Worker(firstTask);
        final Thread t = w.thread;
        if (t != null) {
            final ReentrantLock mainLock = this.mainLock;
            mainLock.lock();
            try {
                // Recheck while holding lock.
                // Back out on ThreadFactory failure or if
                // shut down before lock acquired.
                int rs = runStateOf(ctl.get());

                if (rs < SHUTDOWN ||
                    (rs == SHUTDOWN && firstTask == null)) {
                    if (t.isAlive()) // precheck that t is startable
                        throw new IllegalThreadStateException();
                    workers.add(w);
                    int s = workers.size();
                    if (s > largestPoolSize)
                        largestPoolSize = s;
                    workerAdded = true;
                }
            } finally {
                mainLock.unlock();
            }
            if (workerAdded) {
                t.start();
                workerStarted = true;
            }
        }
    } finally {
        if (! workerStarted)
            addWorkerFailed(w);
    }
    return workerStarted;
}

这个方法代码看似很复杂，没关系，我们一步一步来分析

step 1
先看第一部分

    retry:
    for (;;) {
        int c = ctl.get();
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN &&
            ! (rs == SHUTDOWN &&
               firstTask == null &&
               ! workQueue.isEmpty()))
            return false;

        for (;;) {
            int wc = workerCountOf(c);
            if (wc >= CAPACITY ||
                wc >= (core ? corePoolSize : maximumPoolSize))
                return false;
            if (compareAndIncrementWorkerCount(c))
                break retry;
            c = ctl.get();  // Re-read ctl
            if (runStateOf(c) != rs)
                continue retry;
            // else CAS failed due to workerCount change; retry inner loop
        }
    }

这一部分代码，主要是判断，是否可以添加一个工作线程。

在execute()中已经判断过if (workerCountOf(c) < corePoolSize)了，为什么还要再判断？

因为在多线程环境中，当上下文切换到这里的时候，可能线程池已经关闭了，或者其他线程提交了任务，导致workerCountOf(c) > corePoolSize

1）首先进入第一个无限for循环，获取ctl对象，获取当前线程的运行状态，然后判断

if (rs >= SHUTDOWN &&
    ! (rs == SHUTDOWN &&
       firstTask == null &&
       ! workQueue.isEmpty()))
    return false;

这个判断的意义为，当线程池运行状态 >= SHUTDOWN 时，必须同时满足 rs == SHUTDOWN, firstTask == null, ! workQueue.isEmpty() 三个条件，如果有一个条件未满足，! (rs == SHUTDOWN && firstTask == null && ! workQueue.isEmpty()) 会为true，导致 return false; 添加线程失败

所以当线程状态为SHUTDOWN时，而队列又有任务时，线程池可以添加一个无任务的工作线程去执行队列中的任务。

2）进入第二个无限for循环

for (;;) {
    int wc = workerCountOf(c);
    if (wc >= CAPACITY ||
        wc >= (core ? corePoolSize : maximumPoolSize))
        return false;
    if (compareAndIncrementWorkerCount(c))
        break retry;
    c = ctl.get();  // Re-read ctl
    if (runStateOf(c) != rs)
        continue retry;
    // else CAS failed due to workerCount change; retry inner loop
}

获取当前有效线程数，if 有效线程数 >= 容量 || 有效线程数 >= 核心池数量/最大池数量，则return false; 添加线程失败

如果有效线程数在合理范围之内，尝试使用 CAS 自增有效线程数（CAS 是Java中的乐观锁，不了解的小伙伴可以Google一下）

如果自增成功，break retry; 跳出这两个循环，执行下面的代码

自增失败，检查线程池状态，如果线程池状态发生变化，回到第一个for 继续执行；否则继续在第二个for 中；

step 2
下面这部分就比较简单了

boolean workerStarted = false;
boolean workerAdded = false;
Worker w = null;
try {
    w = new Worker(firstTask);
    final Thread t = w.thread;
    if (t != null) {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            // Recheck while holding lock.
            // Back out on ThreadFactory failure or if
            // shut down before lock acquired.
            int rs = runStateOf(ctl.get());

            if (rs < SHUTDOWN ||
                (rs == SHUTDOWN && firstTask == null)) {
                if (t.isAlive()) // precheck that t is startable
                    throw new IllegalThreadStateException();
                workers.add(w);
                int s = workers.size();
                if (s > largestPoolSize)
                    largestPoolSize = s;
                workerAdded = true;
            }
        } finally {
            mainLock.unlock();
        }
        if (workerAdded) {
            t.start();
            workerStarted = true;
        }
    }
} finally {
    if (! workerStarted)
        addWorkerFailed(w);
}
return workerStarted;

1）创建工作线程对象Worker；

2）加锁，判断当前线程池状态是否允许启动线程；
如果可以，将线程加入workers（这个变量在需要遍历所有工作线程时会用到），记录最大值，启动线程；

3）如果线程启动失败，执行addWorkerFailed（从workers中移除该对象，有效线程数减一，尝试中止线程池）

Worker

Worker对象是线程池中的内部类，线程的复用、线程超时都是在这实现的

private final class Worker
    extends AbstractQueuedSynchronizer
    implements Runnable
{
    /**
     * This class will never be serialized, but we provide a
     * serialVersionUID to suppress a javac warning.
     */
    private static final long serialVersionUID = 6138294804551838833L;

    /** Thread this worker is running in.  Null if factory fails. */
    final Thread thread;
    /** Initial task to run.  Possibly null. */
    Runnable firstTask;
    /** Per-thread task counter */
    volatile long completedTasks;

    /**
     * Creates with given first task and thread from ThreadFactory.
     * @param firstTask the first task (null if none)
     */
    Worker(Runnable firstTask) {
        setState(-1); // inhibit interrupts until runWorker
        this.firstTask = firstTask;
        this.thread = getThreadFactory().newThread(this);
    }

    /** Delegates main run loop to outer runWorker  */
    public void run() {
        runWorker(this);
    }

    // Lock methods
    //
    // The value 0 represents the unlocked state.
    // The value 1 represents the locked state.

    protected boolean isHeldExclusively() {
        return getState() != 0;
    }

    protected boolean tryAcquire(int unused) {
        if (compareAndSetState(0, 1)) {
            setExclusiveOwnerThread(Thread.currentThread());
            return true;
        }
        return false;
    }

    protected boolean tryRelease(int unused) {
        setExclusiveOwnerThread(null);
        setState(0);
        return true;
    }

    public void lock()        { acquire(1); }
    public boolean tryLock()  { return tryAcquire(1); }
    public void unlock()      { release(1); }
    public boolean isLocked() { return isHeldExclusively(); }

    void interruptIfStarted() {
        Thread t;
        if (getState() >= 0 && (t = thread) != null && !t.isInterrupted()) {
            try {
                t.interrupt();
            } catch (SecurityException ignore) {
            }
        }
    }
}

Worker 实现了 Runnable，我们这里只关心 Worker 的run方法中做了什么，关于 AbstractQueuedSynchronizer 有关的不在本文讨论

    public void run() {
        runWorker(this);
    }

final void runWorker(Worker w) {
    Thread wt = Thread.currentThread();
    Runnable task = w.firstTask;
    w.firstTask = null;
    w.unlock(); // allow interrupts
    boolean completedAbruptly = true;
    try {
        while (task != null || (task = getTask()) != null) {
            w.lock();
            // If pool is stopping, ensure thread is interrupted;
            // if not, ensure thread is not interrupted.  This
            // requires a recheck in second case to deal with
            // shutdownNow race while clearing interrupt
            if ((runStateAtLeast(ctl.get(), STOP) ||
                 (Thread.interrupted() &&
                  runStateAtLeast(ctl.get(), STOP))) &&
                !wt.isInterrupted())
                wt.interrupt();
            try {
                beforeExecute(wt, task);
                Throwable thrown = null;
                try {
                    task.run();
                } catch (RuntimeException x) {
                    thrown = x; throw x;
                } catch (Error x) {
                    thrown = x; throw x;
                } catch (Throwable x) {
                    thrown = x; throw new Error(x);
                } finally {
                    afterExecute(task, thrown);
                }
            } finally {
                task = null;
                w.completedTasks++;
                w.unlock();
            }
        }
        completedAbruptly = false;
    } finally {
        processWorkerExit(w, completedAbruptly);
    }
}

runWork还是比较简单的

while 循环不断地通过 getTask() 从任务队列中获取任务；
确保线程池STOP时，线程会被打断；
然后就是执行 task.run，也就是执行我们定义的Runnable；
如果我们的Runnable 抛出异常或者 getTask() == null （也就是缓冲队列为空）时，会执行 processWorkerExit(w, completedAbruptly); （该方法会根据线程池状态，尝试中止线程池。然后会考虑是结束当前线程，还是再新建一个工作线程，这里就不细说了）
beforeExecute(wt, task); 和 afterExecute(task, thrown); 默认是没有实现的，我们可以自己扩展

我们再来看一下 getTask() 方法

private Runnable getTask() {
    boolean timedOut = false; // Did the last poll() time out?

    for (;;) {
        int c = ctl.get();
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
            decrementWorkerCount();
            return null;
        }

        int wc = workerCountOf(c);

        // Are workers subject to culling?
        boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

        if ((wc > maximumPoolSize || (timed && timedOut))
            && (wc > 1 || workQueue.isEmpty())) {
            if (compareAndDecrementWorkerCount(c))
                return null;
            continue;
        }

        try {
            Runnable r = timed ?
                workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
                workQueue.take();
            if (r != null)
                return r;
            timedOut = true;
        } catch (InterruptedException retry) {
            timedOut = false;
        }
    }
}

1） if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) 当满足该条件时自减有效线程数，并返回 null；当return null 时，runWorker中就会去执行上述说的 processWorkerExit(w, completedAbruptly);

2） timed：是否允许线程超时；
当允许核心池线程超时 || 有效线程数 > 核心池数量 时，timed = true；
顺便说一下 timedOut：线程是否超时，当timed = true 时，timedOut 才有可能为true

3）

// (有效线程数 > 最大线程池数量 || (允许超时 && 超时) ) 
//  && (有效线程数 > 1 || 或者队列为空时)
if ((wc > maximumPoolSize || (timed && timedOut))
    && (wc > 1 || workQueue.isEmpty())) {
    if (compareAndDecrementWorkerCount(c))
        return null;
    continue;
}

满足上述条件，并且有效线程数自减成功后，return null;

4）根据 timed，决定调用使用poll() 或者 take()。poll 在队列为空时会等待指定时间，take 则在队列为空时会一直等待，直至队列中被添加新的任务，或者被打断；

这两个方法都会被shutdown() 或者 shutdownNow的 thread.interrupt()打断；
如果被打断则重新执行

只有在 poll 等待超时的时候才会return null； timeOut = true，此时再看步骤3 的 (timed && timedOut) 的就满足了

至此 execute() 方法所涉及的大部分逻辑我们都分析完了

备注

线程池使用

public class Test {

    public static void main(String[] args) {
        ThreadPoolExecutor executor = new ThreadPoolExecutor(5, 10, 60, TimeUnit.SECONDS,
                new ArrayBlockingQueue<>(5));

        executor.execute(() -> {
            // 业务逻辑
        });

        executor.shutdown();
    }

}

合理配置线程池的大小

一般需要根据任务的类型来配置线程池大小：

如果是CPU密集型任务，参考值可以设为 N+1 （N 为CPU核心数）

如果是IO密集型任务，参考值可以设置为2*N

当然，这只是一个参考值，具体的设置还需要根据实际情况进行调整，比如可以先将线程池大小设置为参考值，再观察任务运行情况和系统负载、资源利用率来进行适当调整。