Android ANR触发原理

2018-11-27  本文已影响0人  CyanStone

原理简介

Android中的ANR,是Application Not Responding的简称。在Android系统中,ActivityManagerService和WindowManagerService会检测APP的响应时间,在应用进程的主线程处理特定的事件之前,用AMS/BroadcastQueue等相关的Handler像系统进程的Looper发送一个延时消息,在延时的时间之内,如果特定事件被执行完,则会移除掉MessageQueue中加入的那个延时消息;否则,如果特定的事件没有执行完,则不会移除那个消息,相应的Looper会取出该消息进行处理,从而触发ANR。这就是触发ANR的原理。


触发ANR的条件


源码分析(基于Android 8.0)

1.Service

ActiveServices是AMS管理的一个对象,它主要负责Service的启动、停止、绑定等相关的工作。具体的Service启动流程这里暂时不做分析,现在主要来看在我们调用了ContextImpl.startService()方法后的真正启动Service的方法realStartServiceLocked:

private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app,
    boolean execInFg) throws RemoteException {
    ...
    //发送延时消息的方法
    bumpServiceExecutingLocked(r, execInFg, "create");
    ...
    //创建Service并执行onCreate方法,这里不再进一步分析
    app.thread.scheduleCreateService(r, r.serviceInfo, 
        mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
        app.repProcState);
   ...
}

下面来看下bumpServiceExecutingLocked方法

private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
  ...
  scheduleServiceTimeoutLocked(r.app);
  ...
}

void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
   //execServicesFg是是否需要Service在前台执行的标志位
    mAm.mHandler.sendMessageDelayed(msg,
                proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
    }

可以看出,这里使用mAm.Handler向AMS所在的线程的MessageQueue发送了一个延时消息(消息的what值是ActivityManagerService.SERVICE_TIMEOUT_MSG),根据是否需要在前台执行,延时的时间是不一样的:

//定义在ActivityManagerService中,Service超时消息的what值
static final int SERVICE_TIMEOUT_MSG = 12;
//ActiveServices文件
// How long we wait for a service to finish executing.
//等待前台Service执行完毕,超时时间20秒
static final int SERVICE_TIMEOUT = 20*1000;
// How long we wait for a service to finish executing.
//后台广播的执行时间是前台广播执行的10倍,200秒
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;

这样便向主线程的MessageQueue中发送了延时消息,并开启了Service。那么,如果Service在延时时间到达前如果执行完毕,应该把入队的这个延时消息给移除掉,移除的逻辑是在哪儿呢?通过调用链的层层调用,发现答案就在ActivityThread的handleCreateService方法中:

private void handleCreateService(CreateServiceData data) {
    ...
    Service service = null;
    java.lang.ClassLoader cl = loadedApk.getClassLoader();
    //通过反射创建Service的实例对象
    service = (Service) cl.loadClass(data.info.name).newInstance();
    ...
    //执行Service的onCreate方法
    service.onCreate();
    ...
    //
    ActivityManager.getService().serviceDoneExecuting(data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
   ...
}

可以看到,在执行完Service的onCreate方法后,通过Binder调用了AMS中的serviceDoneExecuting方法去通知Service已经启动。下面来看AMS中的serviceDoneExecuting方法:

public void serviceDoneExecuting(IBinder token, int type, int startId, int res) {
    synchronized(this) {
        ...
        mServices.serviceDoneExecutingLocked((ServiceRecord) token, type, startId, res);
    }
}

AMS中的serviceDoneExecuting方法直接回调了ActiveServices中的serviceDoneExecutingLocked方法:

 void serviceDoneExecutingLocked(ServiceRecord r, int type, int startId, int res) { 
    ...
    serviceDoneExecutingLocked(r, inDestroying, inDestroying);
   ...
}

private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
      boolean finishing) {
    ...
   //在这里,把加入的延时消息给移除掉了
   mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
   ...
}

到此,我们就把添加延时消息和移除延时消息的逻辑分析清楚了,那么,假如在延时时间内,Service没有执行完,会发生什么呢?熟悉Android异步消息机制的同学应该明白,我们应该去mAm.Handler中查看对SERVICE_TIMEOUT_MSG消息的处理了,mAm.Handler是AMS中定义的一个内部类:

//ActivityManagerService.java
final class MainHandler extends Handler {
    public MainHandler(Looper looper) {
        super(looper, null, true);
    }
    @Override 
    public void handleMessage(Message msg) {
        switch (msg.what) {
              case SERVICE_TIMEOUT_MSG:{
                 mServices.serviceTimeout((ProcessRecord)msg.obj);
            }
        }
    }
}

可以看到,超时处理,最后又交给了ActiveServices对象进行处理:

 void serviceTimeout(ProcessRecord proc) {
    ...
    if (anrMessage != null) {
         mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);
    }
}

最后,利用AppErrors对象去进行ANR通知用户,具体ANR执行操作的方法就不再进行分析了;
至此,关于Service的整个ARN的源码就分析完了,可以看出流程就是:1.事件执行前添加延时消息;2.事件执行完毕后移除延时消息; 3.延时时间内事件为执行完,延时消息被处理,发生ANR。

2. BroadcastReceiver

这里不具体分析Broadcast的注册、接收等整个流程,需要知道的是,我们注册广播的时候,其实是注册进了AMS中,当AMS接收到发送来的广播后,最后对广播进行处理的方法其实是在BroadcastQueue文件的中的processNextBroadcast方法:

final void processNextBroadcast(boolean fromMsg) {
   ...
        do {
            r = mOrderedBroadcasts.get(0);
            //获取所有该广播所有的接收者
            int numReceivers = (r.receivers != null) ? r.receivers.size() : 0;
            if (mService.mProcessesReady && r.dispatchTime > 0) {
                long now = SystemClock.uptimeMillis();
                if ((numReceivers > 0) &&
                        (now > r.dispatchTime + (2*mTimeoutPeriod*numReceivers))) {
                    //当广播处理时间超时,则强制结束这条广播
                    broadcastTimeoutLocked(false);
                    ...
                }
            }
            if (r.receivers == null || r.nextReceiver >= numReceivers
                    || r.resultAbort || forceReceive) {
                if (r.resultTo != null) {
                    //处理广播消息消息
                    performReceiveLocked(r.callerApp, r.resultTo,
                        new Intent(r.intent), r.resultCode,
                        r.resultData, r.resultExtras, false, false, r.userId);
                    r.resultTo = null;
                }
                //执行完毕,取消超时处理
                cancelBroadcastTimeoutLocked();
                ...
                mOrderedBroadcasts.remove(0);
               ...
            }
        } while (r == null);
        ...

        //获取下条有序广播
        r.receiverTime = SystemClock.uptimeMillis();
        if (!mPendingBroadcastTimeoutMessage) {
            long timeoutTime = r.receiverTime + mTimeoutPeriod;
            //添加延迟消息,延时的时间为mTimeoutPeriod
            setBroadcastTimeoutLocked(timeoutTime);
        }
        ...
}

从上述代码可以知道,调用setBroadcastTimeoutLocked方法把延时消息加进去,在所有注册的广播接收器的逻辑执行完了以后,再把延时消息给移除掉,下面我们来看setBroadcastTimeoutLocked方法和cancelBroadcastTimeoutLocked方法:

final void setBroadcastTimeoutLocked(long timeoutTime) {
    if (!mPendingBroadcastTimeoutMessage) {
        Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
        mHandler.sendMessageAtTime(msg, timeoutTime);
        mPendingBroadcastTimeoutMessage = true;
    }
}

final void cancelBroadcastTimeoutLocked() {
    if (mPendingBroadcastTimeoutMessage) {
        mHandler.removeMessages(BROADCAST_TIMEOUT_MSG, this);
        mPendingBroadcastTimeoutMessage = false;
    }
}

可以看到两个方法就是添加消息和移除消息,其中timeoutTime是 r.receiverTime + mTimeoutPeriod得到的,receiverTime是当前系统时间,而mTimeoutPeriod则是在初始化BroadcastQueue初始化的时候传进来的,而BroadcastQueue则是在AMS中初始化的:

//ActivityManagerService.java
 //前台广播超时时间
 static final int BROADCAST_FG_TIMEOUT = 10*1000;
//后台广播超时时间
 static final int BROADCAST_BG_TIMEOUT = 60*1000;

//前台广播队列
BroadcastQueue mFgBroadcastQueue;
//后台广播队列
BroadcastQueue mBgBroadcastQueue;

public ActivityManagerService(Context systemContext) {
    mFgBroadcastQueue = new BroadcastQueue(this, mHandler,
                "foreground", BROADCAST_FG_TIMEOUT, false);
    mBgBroadcastQueue = new BroadcastQueue(this, mHandler,
                "background", BROADCAST_BG_TIMEOUT, true);
}

从上述可以知道,在AMS中分别维护了前台广播队列和后台广播队列,两者的超时时间分别为10秒和60秒。下面我们看看对超时消息的处理,发送消息的mHandler是BroadcastQueue内部类BroadcastHandler的对象:

 final BroadcastHandler mHandler;
 private final class BroadcastHandler extends Handler {
    public BroadcastHandler(Looper looper) {
        super(looper, null, true);
    }
    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
            case BROADCAST_INTENT_MSG: {
                if (DEBUG_BROADCAST) Slog.v(
                        TAG_BROADCAST, "Received BROADCAST_INTENT_MSG");
                processNextBroadcast(true);
            } break;
            case BROADCAST_TIMEOUT_MSG: {
                synchronized (mService) {
                    broadcastTimeoutLocked(true);
                }
            } break;
        }
    }
}

超时后会执行broadcastTimeoutLocked方法,从而触发ANR。

final void broadcastTimeoutLocked(boolean fromMsg) {
    ...
       if (anrMessage != null) {
       // Post the ANR to the handler since we do not want to process ANRs while
       // potentially holding our lock.
        mHandler.post(new AppNotResponding(app, anrMessage));
    }
}

通过上述流程,我们就把Broadcast触发ANR的源码分析清楚了,流程同样跟Service是一样的:1.事件执行前添加延时消息;2.事件执行完毕后移除延时消息; 3.延时时间内事件为执行完,延时消息被处理,发生ANR。

3.ContentProvider

ContentProvider Timeout是位于ActivityManager线程中的AMS.MainHandler收到CONTENT_PROVIDER_PUBLISH_TIMEOUT_MSG消息时触发。具体逻辑同Service和BroadcastReceiver,具体源码逻辑这里就不做分析了,感兴趣的同学可以自行去查看。

同样,在AMS启动Activity的时候,对启动和暂停相关Activity,也加入了类似超时处理,超时时间设定为500毫秒,所以在onPause方法中,最好不要做耗时的操作,而要放到onStop中,因为onStop和onDestroy的超时时间都是10s。

    // How long we wait until giving up on the last activity to pause.  This
    // is short because it directly impacts the responsiveness of starting the
    // next activity.
    private static final int PAUSE_TIMEOUT = 500;

    // How long we wait for the activity to tell us it has stopped before
    // giving up.  This is a good amount of time because we really need this
    // from the application in order to get its saved state.
    private static final int STOP_TIMEOUT = 10 * 1000;

    // How long we wait until giving up on an activity telling us it has
    // finished destroying itself.
    private static final int DESTROY_TIMEOUT = 10 * 1000;

如何避免ANR

Android系统增加的ANR机制的本质,其实都是监控主线程是否发生阻塞,所以要避免ANR,记住一条,就是:


参考链接

Android ANR:原理分析及解决办法

上一篇下一篇

猜你喜欢

热点阅读