从堆栈角度看线程状态(7)
异常信息
见过很多这样的人,写代码、调式代码的过程中,程序一旦出现异常,就很着急的去清空日志输出控制台,清空的异常日志信息,然后尝试各种重启方式。异常一直出现就开始嘀咕了,刚才还好好的啊、以前没出现过这问题啊......
没有仔细认真的看异常是很多新手容易犯的错误,不少老手也会存在这样的问题,不知道你是否躺枪。其实,认真的看异常问题就解决了一半了。异常信息是我们排查问题的方向。对于没有异常的、或是把异常吃掉的这种程序问题,问题定位、排查是相对困难的,至少问题定位不会那么直接,可能会把整个业务条线、甚至对整个系统体系进行分析才能摸到问题产生的原因。
可见异常信息对问题的排查提供了可靠的依据,异常信息反应的是程序调用的整个过程,调了哪个方法、是哪个实现类、卡在什么地方了......
异常信息是线程执行过程中出现异常时,输出的线程堆栈信息,下面我们看看线程各种状态对应的堆栈信息。
线程堆栈信息解读
堆栈状态WAITING与TIMED_WAITING:
调用以下三个方法会进入WAITING状态:
- Object.wait() 不设置超时时间
- Thread.join() 不设置超时时间
- LockSupport.park() 不设置超时时间
调用下面的方法会进入TIMED_WAITING状态:
- Object.wait(time)
- Thread.join(time)
- Thread.sleep(time)
- LockSupport.parkNanos(time)
- LockSupport.parkUntil(time)
RUNNABLE和BLOCKED状态
- 在执行中的线程处于RUNNABLE状态
- 等待监视器同步锁的线程处于BLOCKED状态
堆栈信息中不存在NEW和TERMINATED状态
看代码示例:
package net.zkbc.p2p;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantLock;
/**
* @author HXJ
* @date 2018/8/1
*/
public class ThreadStack {
public static void main(String[] args) {
final Object blocker = new Object();
//运行状态的线程
Thread threadRun = new Thread(new Runnable() {
@Override
public void run() {
while (true) {
//do something ...
}
}
}, "Thread-run");
//同步阻塞状态的线程
Thread threadBlock = new Thread(new Runnable() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " 同步锁阻塞 !");
synchronized (blocker) {
System.out.println(Thread.currentThread().getName() + " 取得同步锁 run ... !");
}
}
}, "Thread-block");
//wait等待
Thread threadWait = new Thread(new Runnable() {
@Override
public void run() {
synchronized (this) {
System.out.println(Thread.currentThread().getName() + " wait ... !");
try {
this.wait();
} catch (InterruptedException e) {
}
}
}
}, "Thread-wait");
//wait timeout 等待
Thread threadWaitTimeout = new Thread(new Runnable() {
@Override
public void run() {
synchronized (this) {
System.out.println(Thread.currentThread().getName() + " wait ... !");
try {
this.wait(10 * 1000 * 60);
} catch (InterruptedException e) {
}
}
}
}, "Thread-wait-timeout");
//sleep 等待
Thread threadSleepWaitTimeout = new Thread(new Runnable() {
@Override
public void run() {
try {
System.out.println(Thread.currentThread().getName() + " wait ... !");
Thread.sleep(10 * 1000 * 60);
} catch (InterruptedException e) {
}
}
}, "Thread-sleep-wait-timeout");
//lock.park 等待 为了显示相对应阻塞的堆栈信息
Lock lock = new ReentrantLock();
//lock获得锁后执行的线程
Thread lockParkRun = new Thread(new Runnable() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " lock-park run...");
lock.lock();
while (true) {
//do something ...
}
}
}, "Thread-lock.park-run");
//lock失败后park()阻塞的线程
Thread lockPark = new Thread(new Runnable() {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + " lock-park park ...");
lock.lock();
}
}, "Thread-lock.park-park");
try {
synchronized (blocker) {
System.out.println(threadRun.getName() + " 新建状态:" +threadRun.getState());
threadRun.start();
threadBlock.start();
threadWait.start();
threadWaitTimeout.start();
threadSleepWaitTimeout.start();
lockParkRun.start();
lockParkRun.join(100);
lockPark.start();
//主线程不释放锁,并且通过等待阻塞线程的方式进行阻塞
threadBlock.join();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
运行结果:
Thread-run 新建状态:NEW
Thread-block 同步锁阻塞 !
Thread-wait wait ... !
Thread-wait-timeout wait ... !
Thread-sleep-wait-timeout wait ... !
Thread-lock.park-run lock-park run...
Thread-lock.park-park lock-park park ...
代码中threadRun线程在start()之前,状态是NEW,启动后在堆栈信息中为RUNNABLE状态。threadRun是个空死循环,测试代码的时候注意自己的电脑哦!
主线程创建完其他线程后,获取blocker对象上的同步锁,然后分别启动其他线程。threadBlock线程启动后,获取blocker对象上的同步锁,由于该锁已经被主线程持有,所有进入BLOCKED阻塞状态。其他线程进入相应状态,最后避免主线程退出,直接让它等待在threadBlock线程监视器上,由于threadBlock无法获得同步锁而无法执行,至此在其他超时等待的线程超时前有如下堆栈信息(部分):
"Thread-lock.park-park" #18 prio=5 os_prio=0 tid=0x00000000201a1800 nid=0x70f0 waiting on condition [0x0000000020fcf000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076bdf6070> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at net.zkbc.p2p.ThreadStack$7.run(ThreadStack.java:90)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-lock.park-run" #17 prio=5 os_prio=0 tid=0x000000002018c000 nid=0x7740 runnable [0x0000000020ecf000]
java.lang.Thread.State: RUNNABLE
at net.zkbc.p2p.ThreadStack$6.run(ThreadStack.java:81)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x000000076bdf6070> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"Thread-sleep-wait-timeout" #16 prio=5 os_prio=0 tid=0x000000002018b000 nid=0x7a78 waiting on condition [0x0000000020dcf000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at net.zkbc.p2p.ThreadStack$5.run(ThreadStack.java:68)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-wait-timeout" #15 prio=5 os_prio=0 tid=0x0000000020188800 nid=0x62d4 in Object.wait() [0x0000000020cce000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076bded8b8> (a net.zkbc.p2p.ThreadStack$4)
at net.zkbc.p2p.ThreadStack$4.run(ThreadStack.java:55)
- locked <0x000000076bded8b8> (a net.zkbc.p2p.ThreadStack$4)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-wait" #14 prio=5 os_prio=0 tid=0x0000000020187800 nid=0x759c in Object.wait() [0x0000000020bcf000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076bdead90> (a net.zkbc.p2p.ThreadStack$3)
at java.lang.Object.wait(Object.java:502)
at net.zkbc.p2p.ThreadStack$3.run(ThreadStack.java:41)
- locked <0x000000076bdead90> (a net.zkbc.p2p.ThreadStack$3)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-block" #13 prio=5 os_prio=0 tid=0x0000000020150800 nid=0x4f70 waiting for monitor entry [0x0000000020acf000]
java.lang.Thread.State: BLOCKED (on object monitor)
at net.zkbc.p2p.ThreadStack$2.run(ThreadStack.java:29)
- waiting to lock <0x000000076bb53ef0> (a java.lang.Object)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"Thread-run" #12 prio=5 os_prio=0 tid=0x000000002014f800 nid=0x4f04 runnable [0x00000000209cf000]
java.lang.Thread.State: RUNNABLE
at net.zkbc.p2p.ThreadStack$1.run(ThreadStack.java:17)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"main" #1 prio=5 os_prio=0 tid=0x00000000036ac800 nid=0x779c in Object.wait() [0x000000000369f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076bb56140> (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1252)
- locked <0x000000076bb56140> (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1326)
at net.zkbc.p2p.ThreadStack.main(ThreadStack.java:106)
- locked <0x000000076bb53ef0> (a java.lang.Object)
Locked ownable synchronizers:
- None
可以通过jstack命令获取堆栈信息文件,如:jstack -l pid > stack.log
我们可以清晰的从堆栈信息中看到每个线程当前的状态,新建、终止状态不在这里体现,也清晰的反应出程序调用的每一个过程。用好堆栈信息,很多问题便可迎刃而解!
Waiting for monitor entry 和 in Object.wait()
这两种阻塞情况是基于Monitor实现的,Monitor是Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。每一个对象有且仅有一个 monitor。如图:
从图中可以看出,每个 Monitor在任何时刻,只能被一个线程拥有,该线程就是 “Active Thread”,而其它线程都是 “Waiting Thread”,分别在两个队列 “ Entry Set”和 “Wait Set”里面等候。在 “Entry Set”中等待的线程状态是 “Waiting for monitorentry”,而在 “Wait Set”中等待的线程状态是“in Object.wait()”。
通过使用关键字synchronized来使用监视器,synchronized保护起来的代码段为临界区。当一个线程申请进入临界区时,它就进入了 “Entry Set”队列。如:
synchronized(obj){
//临界区 do something ...
}
当线程获得了 Monitor,进入了临界区之后,如果发现线程继续运行的条件没有满足,它则调用对象(一般就是被 synchronized 的对象)的 wait() 方法,放弃了 Monitor,进入 “Wait Set”队列。只有当别的线程在该对象上调用了 notify() 或者 notifyAll() , “ Wait Set”队列中线程才得到机会去竞争,但是只有一个线程获得对象的Monitor,恢复到运行态。
synchronized(obj){
...
//未满足执行条件 wait ... 等待obj.notify();
obj.wait();
}
waiting on condition
sleep()、LockSupport.park()都属于这种阻塞,等待条件满足。
线程资源(cpu、内存)占用高案例分析
示例:
在linux环境下,通过命令:top -Hp pid
获得如图:
取将十进制16275对应的线程pid转换为16进制为3F93。
用命令printf或计算器:
命令行:
printf '%x \n' PID
计算器:
进制转换
堆栈查找到对应nid=0x3f93的线程:
...
"Apollo-ReleaseMessageServiceWithCache-1" #25 daemon prio=5 os_prio=0 tid=0x00007fc4e4009800 nid=0x3f94 waiting on condition [0x00007fc51cca2000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d33338d8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
"Apollo-AppNamespaceServiceWithCache-1" #24 daemon prio=5 os_prio=0 tid=0x00007fc4e44b9800 nid=0x3f93 runnable [0x00007fc51cce3000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d33325e8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- None
"Apollo-DatabaseMessageSender-1" #23 daemon prio=5 os_prio=0 tid=0x00007fc4e4b01800 nid=0x3f92 sleeping[0x00007fc51cd24000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at java.lang.Thread.sleep(Thread.java:340)
at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386)
at com.ctrip.framework.apollo.biz.message.DatabaseMessageSender.lambda$initialize$0(DatabaseMessageSender.java:78)
at com.ctrip.framework.apollo.biz.message.DatabaseMessageSender$$Lambda$2/1620636712.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
...
找到对应线程后,进而查看对应线程的具体堆栈信息。