Android ANR触发原理
原理简介
Android中的ANR,是Application Not Responding的简称。在Android系统中,ActivityManagerService和WindowManagerService会检测APP的响应时间,在应用进程的主线程处理特定的事件之前,用AMS/BroadcastQueue等相关的Handler像系统进程的Looper发送一个延时消息,在延时的时间之内,如果特定事件被执行完,则会移除掉MessageQueue中加入的那个延时消息;否则,如果特定的事件没有执行完,则不会移除那个消息,相应的Looper会取出该消息进行处理,从而触发ANR。这就是触发ANR的原理。
触发ANR的条件
- InputDipatching TimeOut:5秒内无法响应屏幕触发事件或者键盘事件;
- BroadcastQueue TimeOut:在执行前台广播(BroadcastReceiver)的onReceive()方法时10秒没有处理完成,后台广播的超时时间为60s;
- Service TimeOut:前台服务20秒内没有执行完毕;后台服务200秒内没有执行完毕;
- ContentProvider TimeOut:ContentProvider的publish方法在10秒内没有执行完;
源码分析(基于Android 8.0)
1.Service
ActiveServices是AMS管理的一个对象,它主要负责Service的启动、停止、绑定等相关的工作。具体的Service启动流程这里暂时不做分析,现在主要来看在我们调用了ContextImpl.startService()方法后的真正启动Service的方法realStartServiceLocked:
private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
boolean execInFg) throws RemoteException {
...
//发送延时消息的方法
bumpServiceExecutingLocked(r, execInFg, "create");
...
//创建Service并执行onCreate方法,这里不再进一步分析
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
app.repProcState);
...
}
下面来看下bumpServiceExecutingLocked方法
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
...
scheduleServiceTimeoutLocked(r.app);
...
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
//execServicesFg是是否需要Service在前台执行的标志位
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
可以看出,这里使用mAm.Handler向AMS所在的线程的MessageQueue发送了一个延时消息(消息的what值是ActivityManagerService.SERVICE_TIMEOUT_MSG),根据是否需要在前台执行,延时的时间是不一样的:
//定义在ActivityManagerService中,Service超时消息的what值
static final int SERVICE_TIMEOUT_MSG = 12;
//ActiveServices文件
// How long we wait for a service to finish executing.
//等待前台Service执行完毕,超时时间20秒
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
//后台广播的执行时间是前台广播执行的10倍,200秒
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
这样便向主线程的MessageQueue中发送了延时消息,并开启了Service。那么,如果Service在延时时间到达前如果执行完毕,应该把入队的这个延时消息给移除掉,移除的逻辑是在哪儿呢?通过调用链的层层调用,发现答案就在ActivityThread的handleCreateService方法中:
private void handleCreateService(CreateServiceData data) {
...
Service service = null;
java.lang.ClassLoader cl = loadedApk.getClassLoader();
//通过反射创建Service的实例对象
service = (Service) cl.loadClass(data.info.name).newInstance();
...
//执行Service的onCreate方法
service.onCreate();
...
//
ActivityManager.getService().serviceDoneExecuting(data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
...
}
可以看到,在执行完Service的onCreate方法后,通过Binder调用了AMS中的serviceDoneExecuting方法去通知Service已经启动。下面来看AMS中的serviceDoneExecuting方法:
public void serviceDoneExecuting(IBinder token, int type, int startId, int res) {
synchronized(this) {
...
mServices.serviceDoneExecutingLocked((ServiceRecord) token, type, startId, res);
}
}
AMS中的serviceDoneExecuting方法直接回调了ActiveServices中的serviceDoneExecutingLocked方法:
void serviceDoneExecutingLocked(ServiceRecord r, int type, int startId, int res) {
...
serviceDoneExecutingLocked(r, inDestroying, inDestroying);
...
}
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
...
//在这里,把加入的延时消息给移除掉了
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
...
}
到此,我们就把添加延时消息和移除延时消息的逻辑分析清楚了,那么,假如在延时时间内,Service没有执行完,会发生什么呢?熟悉Android异步消息机制的同学应该明白,我们应该去mAm.Handler中查看对SERVICE_TIMEOUT_MSG消息的处理了,mAm.Handler是AMS中定义的一个内部类:
//ActivityManagerService.java
final class MainHandler extends Handler {
public MainHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case SERVICE_TIMEOUT_MSG:{
mServices.serviceTimeout((ProcessRecord)msg.obj);
}
}
}
}
可以看到,超时处理,最后又交给了ActiveServices对象进行处理:
void serviceTimeout(ProcessRecord proc) {
...
if (anrMessage != null) {
mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
}
}
最后,利用AppErrors对象去进行ANR通知用户,具体ANR执行操作的方法就不再进行分析了;
至此,关于Service的整个ARN的源码就分析完了,可以看出流程就是:1.事件执行前添加延时消息;2.事件执行完毕后移除延时消息; 3.延时时间内事件为执行完,延时消息被处理,发生ANR。
2. BroadcastReceiver
这里不具体分析Broadcast的注册、接收等整个流程,需要知道的是,我们注册广播的时候,其实是注册进了AMS中,当AMS接收到发送来的广播后,最后对广播进行处理的方法其实是在BroadcastQueue文件的中的processNextBroadcast方法:
final void processNextBroadcast(boolean fromMsg) {
...
do {
r = mOrderedBroadcasts.get(0);
//获取所有该广播所有的接收者
int numReceivers = (r.receivers != null) ? r.receivers.size() : 0;
if (mService.mProcessesReady && r.dispatchTime > 0) {
long now = SystemClock.uptimeMillis();
if ((numReceivers > 0) &&
(now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
//当广播处理时间超时,则强制结束这条广播
broadcastTimeoutLocked(false);
...
}
}
if (r.receivers == null || r.nextReceiver >= numReceivers
|| r.resultAbort || forceReceive) {
if (r.resultTo != null) {
//处理广播消息消息
performReceiveLocked(r.callerApp, r.resultTo,
new Intent(r.intent), r.resultCode,
r.resultData, r.resultExtras, false, false, r.userId);
r.resultTo = null;
}
//执行完毕,取消超时处理
cancelBroadcastTimeoutLocked();
...
mOrderedBroadcasts.remove(0);
...
}
} while (r == null);
...
//获取下条有序广播
r.receiverTime = SystemClock.uptimeMillis();
if (!mPendingBroadcastTimeoutMessage) {
long timeoutTime = r.receiverTime + mTimeoutPeriod;
//添加延迟消息,延时的时间为mTimeoutPeriod
setBroadcastTimeoutLocked(timeoutTime);
}
...
}
从上述代码可以知道,调用setBroadcastTimeoutLocked方法把延时消息加进去,在所有注册的广播接收器的逻辑执行完了以后,再把延时消息给移除掉,下面我们来看setBroadcastTimeoutLocked方法和cancelBroadcastTimeoutLocked方法:
final void setBroadcastTimeoutLocked(long timeoutTime) {
if (!mPendingBroadcastTimeoutMessage) {
Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
mHandler.sendMessageAtTime(msg, timeoutTime);
mPendingBroadcastTimeoutMessage = true;
}
}
final void cancelBroadcastTimeoutLocked() {
if (mPendingBroadcastTimeoutMessage) {
mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
mPendingBroadcastTimeoutMessage = false;
}
}
可以看到两个方法就是添加消息和移除消息,其中timeoutTime是 r.receiverTime + mTimeoutPeriod得到的,receiverTime是当前系统时间,而mTimeoutPeriod则是在初始化BroadcastQueue初始化的时候传进来的,而BroadcastQueue则是在AMS中初始化的:
//ActivityManagerService.java
//前台广播超时时间
static final int BROADCAST_FG_TIMEOUT = 10*1000;
//后台广播超时时间
static final int BROADCAST_BG_TIMEOUT = 60*1000;
//前台广播队列
BroadcastQueue mFgBroadcastQueue;
//后台广播队列
BroadcastQueue mBgBroadcastQueue;
public ActivityManagerService(Context systemContext) {
mFgBroadcastQueue = new BroadcastQueue(this, mHandler,
"foreground", BROADCAST_FG_TIMEOUT, false);
mBgBroadcastQueue = new BroadcastQueue(this, mHandler,
"background", BROADCAST_BG_TIMEOUT, true);
}
从上述可以知道,在AMS中分别维护了前台广播队列和后台广播队列,两者的超时时间分别为10秒和60秒。下面我们看看对超时消息的处理,发送消息的mHandler是BroadcastQueue内部类BroadcastHandler的对象:
final BroadcastHandler mHandler;
private final class BroadcastHandler extends Handler {
public BroadcastHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case BROADCAST_INTENT_MSG: {
if (DEBUG_BROADCAST) Slog.v(
TAG_BROADCAST, "Received BROADCAST_INTENT_MSG");
processNextBroadcast(true);
} break;
case BROADCAST_TIMEOUT_MSG: {
synchronized (mService) {
broadcastTimeoutLocked(true);
}
} break;
}
}
}
超时后会执行broadcastTimeoutLocked方法,从而触发ANR。
final void broadcastTimeoutLocked(boolean fromMsg) {
...
if (anrMessage != null) {
// Post the ANR to the handler since we do not want to process ANRs while
// potentially holding our lock.
mHandler.post(new AppNotResponding(app, anrMessage));
}
}
通过上述流程,我们就把Broadcast触发ANR的源码分析清楚了,流程同样跟Service是一样的:1.事件执行前添加延时消息;2.事件执行完毕后移除延时消息; 3.延时时间内事件为执行完,延时消息被处理,发生ANR。
3.ContentProvider
ContentProvider Timeout是位于ActivityManager线程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息时触发。具体逻辑同Service和BroadcastReceiver,具体源码逻辑这里就不做分析了,感兴趣的同学可以自行去查看。
同样,在AMS启动Activity的时候,对启动和暂停相关Activity,也加入了类似超时处理,超时时间设定为500毫秒,所以在onPause方法中,最好不要做耗时的操作,而要放到onStop中,因为onStop和onDestroy的超时时间都是10s。
// How long we wait until giving up on the last activity to pause. This
// is short because it directly impacts the responsiveness of starting the
// next activity.
private static final int PAUSE_TIMEOUT = 500;
// How long we wait for the activity to tell us it has stopped before
// giving up. This is a good amount of time because we really need this
// from the application in order to get its saved state.
private static final int STOP_TIMEOUT = 10 * 1000;
// How long we wait until giving up on an activity telling us it has
// finished destroying itself.
private static final int DESTROY_TIMEOUT = 10 * 1000;
如何避免ANR
Android系统增加的ANR机制的本质,其实都是监控主线程是否发生阻塞,所以要避免ANR,记住一条,就是:
- 避免在主线程执行耗时的操作
- 在Service、BroadcastReceiver、ContentProvider中如果需要执行耗时的操作,请采用合适的多线程技术进行异步调用