BlockCanary源码解析

2017-09-09 本文已影响35人大大大大大先生

BlockCanary原理

如何计算主线程的方法执行耗时

计算方法耗时最简单粗暴的就是在方法之前前记录下开始时间，方法执行完后用当前时间剪去方法开始执行的时间就完事了，但是主线程那么多方法总不能每一个方法都这个干吧？那肯定崩！有没有一个统一的地方来实现这个功能？当然有了，不然这篇博客到这里就戛然而止了......

public static void loop() {
        final Looper me = myLooper();
        if (me == null) {
            throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
        }
        final MessageQueue queue = me.mQueue;

        // Make sure the identity of this thread is that of the local process,
        // and keep track of what that identity token actually is.
        Binder.clearCallingIdentity();
        final long ident = Binder.clearCallingIdentity();

        for (;;) {
            Message msg = queue.next(); // might block
            if (msg == null) {
                // No message indicates that the message queue is quitting.
                return;
            }

            // This must be in a local variable, in case a UI event sets the logger
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }

            final long traceTag = me.mTraceTag;
            if (traceTag != 0 && Trace.isTagEnabled(traceTag)) {
                Trace.traceBegin(traceTag, msg.target.getTraceName(msg));
            }
            try {
                msg.target.dispatchMessage(msg);
            } finally {
                if (traceTag != 0) {
                    Trace.traceEnd(traceTag);
                }
            }

            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }

            // Make sure that during the course of dispatching the
            // identity of the thread wasn't corrupted.
            final long newIdent = Binder.clearCallingIdentity();
            if (ident != newIdent) {
                Log.wtf(TAG, "Thread identity changed from 0x"
                        + Long.toHexString(ident) + " to 0x"
                        + Long.toHexString(newIdent) + " while dispatching to "
                        + msg.target.getClass().getName() + " "
                        + msg.callback + " what=" + msg.what);
            }

            msg.recycleUnchecked();
        }
    }

如上代码中的loop()方法是Looper中的，我们的目的是监测主线程的卡顿问题，因为UI更新界面都是在主线程中进行的，所以在主线程中做耗时操作可能会造成界面卡顿，而主线程的Looper早已经在APP启动的时候Android framework里面创建了main looper，那么一个线程对应一个Looper，Looper当中有一个MessageQueue，专门用来接收Handler发送过来的msg，并且在looper()方法中循环去从MessageQueue中去取msg，然后执行，而且是顺序执行的，那么前面一个msg还没处理完，loop()就会等待它处理完了才会再去执行下一个msg，如果前面一个msg处理很慢，那就会造成卡顿了，在msg.target.dispatchMessage(msg)前有：

if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }

而在dispatchMessage执行完了之后，又有：

if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }

所以，我们只需要计算打印这两天log的时间差，就能得到dispatchMessage的耗时，android提供了Looper.getMainLooper().setMessageLogging(Printer printer)来设置这个logging对象，所以只要自定义一个Printer，然后重写println(String x)方法即可实现耗时统计了，所以原理真的很简单，原理固然简单，但是还是要学会发现这个小技巧，对于BlockCanary而言，初始化：

public class MyApplication extends Application {

    @Override
    public void onCreate() {
        super.onCreate();
        BlockCanary.install(this, new BlockCanaryContext()).start();

    }
}

查看start()方法里面的代码：

/**
     * Start monitoring.
     */
    public void start() {
        if (!mMonitorStarted) {
            mMonitorStarted = true;
            Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
        }
    }

而monitor是一个继承Printer的LooperMonitor类new出来的对象，重写print(String x)方法：

@Override
    public void println(String x) {
        if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
            return;
        }
        if (!mPrintingStarted) {
            mStartTimestamp = System.currentTimeMillis();
            mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
            mPrintingStarted = true;
            startDump();
        } else {
            final long endTime = System.currentTimeMillis();
            mPrintingStarted = false;
            if (isBlock(endTime)) {
                notifyBlockEvent(endTime);
            }
            stopDump();
        }
    }

在dispatchMessage执行之前打印log的时候执行print，mPrintingStarted为false，所以就记录当前的时间，以及当前线程时间mPrintingStarted设置为true，而dispatchMessage执行完后第二次打印log执行print方法，mPrintingStarted是true的，这时候dispatchMessage已经执行结束，然后就能计算耗时，搜集方法堆栈信息，cpu信息等等

方法堆栈信息的搜集

private void startDump() {
        if (null != BlockCanaryInternals.getInstance().stackSampler) {
            BlockCanaryInternals.getInstance().stackSampler.start();
        }

        if (null != BlockCanaryInternals.getInstance().cpuSampler) {
            BlockCanaryInternals.getInstance().cpuSampler.start();
        }
    }

在LooperMonitor的print方法中会执行这个方法，同时采集方法堆栈信息和cpu信息，对于堆栈信息stackSampler.start()：

abstract class AbstractSampler {

    private static final int DEFAULT_SAMPLE_INTERVAL = 300;

    protected AtomicBoolean mShouldSample = new AtomicBoolean(false);
    protected long mSampleInterval;

    private Runnable mRunnable = new Runnable() {
        @Override
        public void run() {
            doSample();

            if (mShouldSample.get()) {
                HandlerThreadFactory.getTimerThreadHandler()
                        .postDelayed(mRunnable, mSampleInterval);
            }
        }
    };

    public AbstractSampler(long sampleInterval) {
        if (0 == sampleInterval) {
            sampleInterval = DEFAULT_SAMPLE_INTERVAL;
        }
        mSampleInterval = sampleInterval;
    }

    public void start() {
        if (mShouldSample.get()) {
            return;
        }
        mShouldSample.set(true);

        HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
        HandlerThreadFactory.getTimerThreadHandler().postDelayed(mRunnable,
                BlockCanaryInternals.getInstance().getSampleDelay());
    }

    public void stop() {
        if (!mShouldSample.get()) {
            return;
        }
        mShouldSample.set(false);
        HandlerThreadFactory.getTimerThreadHandler().removeCallbacks(mRunnable);
    }

    abstract void doSample();
}

调用start方法之后就执行：

private Runnable mRunnable = new Runnable() {
        @Override
        public void run() {
            doSample();

            if (mShouldSample.get()) {
                HandlerThreadFactory.getTimerThreadHandler()
                        .postDelayed(mRunnable, mSampleInterval);
            }
        }
    };

并且这里控制stackSampler.start()只能执行一次，在run方法里面我们可以发现每次间隔mSampleInterval就会去重新跑一次doSample()，这里会执行StackSampler的doSample()：

@Override
    protected void doSample() {
        StringBuilder stringBuilder = new StringBuilder();

        for (StackTraceElement stackTraceElement : mCurrentThread.getStackTrace()) {
            stringBuilder
                    .append(stackTraceElement.toString())
                    .append(BlockInfo.SEPARATOR);
        }

        synchronized (sStackMap) {
            if (sStackMap.size() == mMaxEntryCount && mMaxEntryCount > 0) {
                sStackMap.remove(sStackMap.keySet().iterator().next());
            }
            sStackMap.put(System.currentTimeMillis(), stringBuilder.toString());
        }
    }

mCurrentThread就是主线程对象，0.8 * mSampleInterval（卡顿时长阀值）后的去获取线程的堆栈信息并保存到sStackMap中，这里的意思是，我们认为方法执行超过mSampleInterval就表示卡顿，当方法执行时间已经到了mSampleInterval的0.8倍的时候还没执行完，那么这时候就开始采集方法执行堆栈信息了，如果方法在0.9 * mSampleInterval的时候执行完成，那么不会警告卡顿，但是如果方法执行耗时超过mSampleInterval，那就把0.8 * mSampleInterval这个时间点的堆栈信息认为是造成耗时原因的堆栈信息，而且，这里只要方法还没执行完，就会间隔mSampleInterval去再次获取函数执行堆栈信息并保存，这里之所以遥在0.8 * mSampleInterval的时候就去获取堆栈信息时为了获取到准确的堆栈信息，因为既然函数耗时已经达到0.8 * mSampleInterval了，并且函数还没执行结束，那么很大概率上会导致卡顿了，所以提前获取函数执行堆栈保证获取到造成卡顿的函数调用堆栈的正确性，后面又不断间隔mSampleInterval去获取函数执行堆栈式要获取到更多完整的堆栈信息，当方法执行完成后就会停止获取函数执行堆栈了，所有的函数执行堆栈信息最多存100条，也就是最多有100个函数调用堆栈，以当前的时间戳作为key，当监测到卡顿的时候，要把之前保存在sStackMap的函数堆栈信息展示通知出来，通过时间戳就能取到：

private void notifyBlockEvent(final long endTime) {
        final long startTime = mStartTimestamp;
        final long startThreadTime = mStartThreadTimestamp;
        final long endThreadTime = SystemClock.currentThreadTimeMillis();
        HandlerThreadFactory.getWriteLogThreadHandler().post(new Runnable() {
            @Override
            public void run() {
                mBlockListener.onBlockEvent(startTime, endTime, startThreadTime, endThreadTime);
            }
        });
    }

然后再看mBlockListener.onBlockEvent(startTime, endTime, startThreadTime, endThreadTime)，因为初始化的时候在BlockCanaryInternals构造函数里面已经setMonitor了，并且实现了onBlockEvent：

public BlockCanaryInternals() {

        stackSampler = new StackSampler(
                Looper.getMainLooper().getThread(),
                sContext.provideDumpInterval());

        cpuSampler = new CpuSampler(sContext.provideDumpInterval());

        setMonitor(new LooperMonitor(new LooperMonitor.BlockListener() {

            @Override
            public void onBlockEvent(long realTimeStart, long realTimeEnd,
                                     long threadTimeStart, long threadTimeEnd) {
                // Get recent thread-stack entries and cpu usage
                ArrayList<String> threadStackEntries = stackSampler
                        .getThreadStackEntries(realTimeStart, realTimeEnd);
                if (!threadStackEntries.isEmpty()) {
                    BlockInfo blockInfo = BlockInfo.newInstance()
                            .setMainThreadTimeCost(realTimeStart, realTimeEnd, threadTimeStart, threadTimeEnd)
                            .setCpuBusyFlag(cpuSampler.isCpuBusy(realTimeStart, realTimeEnd))
                            .setRecentCpuRate(cpuSampler.getCpuRateInfo())
                            .setThreadStackEntries(threadStackEntries)
                            .flushString();
                    LogWriter.save(blockInfo.toString());

                    if (mInterceptorChain.size() != 0) {
                        for (BlockInterceptor interceptor : mInterceptorChain) {
                            interceptor.onBlock(getContext().provideContext(), blockInfo);
                        }
                    }
                }
            }
        }, getContext().provideBlockThreshold(), getContext().stopWhenDebugging()));

        LogWriter.cleanObsolete();
    }

再看ArrayList<String> threadStackEntries = stackSampler
.getThreadStackEntries(realTimeStart, realTimeEnd) 获取函数堆栈信息：

public ArrayList<String> getThreadStackEntries(long startTime, long endTime) {
        ArrayList<String> result = new ArrayList<>();
        synchronized (sStackMap) {
            for (Long entryTime : sStackMap.keySet()) {
                if (startTime < entryTime && entryTime < endTime) {
                    result.add(BlockInfo.TIME_FORMATTER.format(entryTime)
                            + BlockInfo.SEPARATOR
                            + BlockInfo.SEPARATOR
                            + sStackMap.get(entryTime));
                }
            }
        }
        return result;
    }

这里面就是通过把开始时间，结束时间和刚才保存起来的堆栈信息的key，也就是保存堆栈信息的时间做对比，在开始时间和结束时间这个范围内的堆栈信息才是有用的，如果一个函数执行了3秒，那么这里会把这三秒内的所有函数执行堆栈信息都取出来，然后再封装成BlockInfo通知到外面，同时可存到文件中，到这里造成卡顿的函数执行堆栈已经采集完成

CPU信息采集

采集当前cpu的使用率，如果cpu使用率太高，可能会导致cpu处理来不及，所以函数执行到一半可能暂时挂起，等待cpu重新调度
采集当前cpu是否繁忙而处理不过来，道理如上，cpu繁忙会导致函数执行一半倍挂起，需要等到下一次cpu调度后重新继续执行
当前app的cpu占用率
用户使用情况，系统使用情况
%ioWait：首先 %iowait 升高并不能证明等待I/O的进程数量增多了，也不能证明等待I/O的总时间增加了；
　　例如，在CPU繁忙期间发生的I/O，无论IO是多还是少，%iowait都不会变；当CPU繁忙程度下降时，有一部分IO落入CPU空闲时间段内，导致%iowait升高。
　　再比如，IO的并发度低，%iowait就高；IO的并发度高，%iowait可能就比较低。

可见%iowait是一个非常模糊的指标，如果看到 %iowait 升高，还需检查I/O量有没有明显增加，avserv/avwait/avque等指标有没有明显增大，应用有没有感觉变慢，如果都没有，就没什么好担心的。

BlockCanary几个核心

LooperMonitor负责统计方法耗时
StackSampler函数执行堆栈采集
CpuSampler式cpu信息采集
耗时异常信息提醒与展示，相关信息持久化到文件，这些比较简单了，这里就不再详细叙述述
综合来看BlockCanary的一个执行流程时这样的，这边使用下BlockCanary的github上面的一张图片：

flow.png