APP卡顿检测工具 BlockCanary——使用和原理

2020-12-12 本文已影响0人邱穆

引子

在复杂的项目环境中，由于历史代码庞大，业务复杂，包含各种第三方库，所以在出现了卡顿的时候，很难定位到底是哪里出现了问题，即便知道是哪一个Activity/Fragment，动辄数千行的类再加上跳来跳去调来调去的，结果就是不了了之随它去了。

事实上，很多情况下卡顿不是必现的，它们可能与机型、环境、操作等有关，存在偶然性，即使发生了，再去查那如山般的logcat，也不一定能找到卡顿的原因。

BlockCanary就是来解决这个问题的。告别打点和调试，哪里卡顿，一目了然。

一、介绍

BlockCanary是一个Android平台的一个非侵入式的性能监控组件，应用只需要实现一个抽象类，提供一些该组件需要的上下文环境，就可以在平时使用应用的时候检测主线程上的各种卡慢问题，并通过组件提供的各种信息分析出原因并进行修复。官方地址：Android Performance Monitor。

BlockCanary对主线程操作进行了完全透明的监控，并能输出有效的信息，帮助开发分析、定位到问题所在，迅速优化应用。其特点有：

非侵入式，简单的两行就打开监控，不需要到处打点，破坏代码优雅性。
精准，输出的信息可以帮助定位到问题所在（精确到行），不需要像Logcat一样，慢慢去找。

目前包括了核心监控输出文件，以及UI显示卡顿信息功能。

目前的问题：由于需要获取CPU的信息，而在API 26（Android O）以后，除非系统级应用，普通应用无法获取 /proc/stat目录下的信息，导致这个插件几乎失效，不过不妨碍我们进行学习。

二、实用方法

2.1 引入

dependencies {
    compile 'com.github.markzhai:blockcanary-android:1.5.0'

    // 仅在debug包启用BlockCanary进行卡顿监控和提示的话，可以这么用
    debugCompile 'com.github.markzhai:blockcanary-android:1.5.0'
    releaseCompile 'com.github.markzhai:blockcanary-no-op:1.5.0'
}

2.2 使用

在Application中：

public class DemoApplication extends Application {
    @Override
    public void onCreate() {
        // 在主进程初始化调用哈
        BlockCanary.install(this, new AppBlockCanaryContext()).start();
    }
}

继承BlockCanaryContext实现自己的AppBlockCanaryContext ：

public class AppBlockCanaryContext extends BlockCanaryContext {
    // 实现各种上下文，包括应用标示符，用户uid，网络类型，卡慢判断阙值，Log保存位置等

    /**
     * Implement in your project.
     *
     * @return Qualifier which can specify this installation, like version + flavor.
     */
    public String provideQualifier() {
        return "unknown";
    }

    /**
     * Implement in your project.
     *
     * @return user id
     */
    public String provideUid() {
        return "uid";
    }

    /**
     * Network type
     *
     * @return {@link String} like 2G, 3G, 4G, wifi, etc.
     */
    public String provideNetworkType() {
        return "unknown";
    }

    /**
     * Config monitor duration, after this time BlockCanary will stop, use
     * with {@code BlockCanary}'s isMonitorDurationEnd
     *
     * @return monitor last duration (in hour)
     */
    public int provideMonitorDuration() {
        return -1;
    }

    /**
     * Config block threshold (in millis), dispatch over this duration is regarded as a BLOCK. You may set it
     * from performance of device.
     *
     * @return threshold in mills
     */
    public int provideBlockThreshold() {
        return 1000;
    }

    /**
     * Thread stack dump interval, use when block happens, BlockCanary will dump on main thread
     * stack according to current sample cycle.
     * <p>
     * Because the implementation mechanism of Looper, real dump interval would be longer than
     * the period specified here (especially when cpu is busier).
     * </p>
     *
     * @return dump interval (in millis)
     */
    public int provideDumpInterval() {
        return provideBlockThreshold();
    }

    /**
     * Path to save log, like "/blockcanary/", will save to sdcard if can.
     *
     * @return path of log files
     */
    public String providePath() {
        return "/blockcanary/";
    }

    /**
     * If need notification to notice block.
     *
     * @return true if need, else if not need.
     */
    public boolean displayNotification() {
        return true;
    }

    /**
     * Implement in your project, bundle files into a zip file.
     *
     * @param src  files before compress
     * @param dest files compressed
     * @return true if compression is successful
     */
    public boolean zip(File[] src, File dest) {
        return false;
    }

    /**
     * Implement in your project, bundled log files.
     *
     * @param zippedFile zipped file
     */
    public void upload(File zippedFile) {
        throw new UnsupportedOperationException();
    }


    /**
     * Packages that developer concern, by default it uses process name,
     * put high priority one in pre-order.
     *
     * @return null if simply concern only package with process name.
     */
    public List<String> concernPackages() {
        return null;
    }

    /**
     * Filter stack without any in concern package, used with @{code concernPackages}.
     *
     * @return true if filter, false it not.
     */
    public boolean filterNonConcernStack() {
        return false;
    }

    /**
     * Provide white list, entry in white list will not be shown in ui list.
     *
     * @return return null if you don't need white-list filter.
     */
    public List<String> provideWhiteList() {
        LinkedList<String> whiteList = new LinkedList<>();
        whiteList.add("org.chromium");
        return whiteList;
    }

    /**
     * Whether to delete files whose stack is in white list, used with white-list.
     *
     * @return true if delete, false it not.
     */
    public boolean deleteFilesInWhiteList() {
        return true;
    }

    /**
     * Block interceptor, developer may provide their own actions.
     */
    public void onBlock(Context context, BlockInfo blockInfo) {

    }
}

三、原理

可翻看笔者前一篇文章：安卓中的消息循环模型

利用Android中的消息处理机制，在Looper.java中这么一段：

private static Looper sMainLooper;  // guarded by Looper.class

...

/**
 * Initialize the current thread as a looper, marking it as an
 * application's main looper. The main looper for your application
 * is created by the Android environment, so you should never need
 * to call this function yourself.  See also: {@link #prepare()}
 */
public static void prepareMainLooper() {
    prepare(false);
    synchronized (Looper.class) {
        if (sMainLooper != null) {
            throw new IllegalStateException("The main Looper has already been prepared.");
        }
        sMainLooper = myLooper();
    }
}

/** Returns the application's main looper, which lives in the main thread of the application.
 */
public static Looper getMainLooper() {
    synchronized (Looper.class) {
        return sMainLooper;
    }
}

即整个应用的主线程，只有这一个looper，不管有多少handler，最后都会回到这里。

而Looper的loop方法中有这么一段：

public static void loop() {
    ...

    for (;;) {
        ...

        // This must be in a local variable, in case a UI event sets the logger
        Printer logging = me.mLogging;
        if (logging != null) {
            logging.println(">>>>> Dispatching to " + msg.target + " " +
                    msg.callback + ": " + msg.what);
        }

        msg.target.dispatchMessage(msg);

        if (logging != null) {
            logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
        }

        ...
    }
}

mLogging在每个message处理的前后被调用，而如果主线程卡住了，就是在dispatchMessage里卡住了。

核心流程图（图源作者博客）：

流程图

BlockCanary启动一个线程负责保存UI线程当前堆栈信息，将堆栈信息以及CPU信息保存分别保存在 mThreadStackEntries和mCpuInfoEntries中，每条信息都以时间撮为key保存。

BlockCanary注册了logging来获取事件开始结束时间。如果检测到事件处理时间超过阈值（默认值1s），则从mThreadStackEntries中查找T1_{T2这段时间内的堆栈信息，并且从mCpuInfoEntries中查找T1}T2这段时间内的CPU及内存信息。并且将信息格式化后保存到本地文件，并且通知用户。
该组件利用了主线程的消息队列处理机制，通过

Looper.getMainLooper().setMessageLogging(mainLooperPrinter);

并在mainLooperPrinter中判断start和end，来获取主线程dispatch该message的开始和结束时间，并判定该时间超过阈值(如2000毫秒)为主线程卡慢发生，并dump出各种信息，提供开发者分析性能瓶颈。

...
@Override
public void println(String x) {
    if (!mStartedPrinting) {
        mStartTimeMillis = System.currentTimeMillis();
        mStartThreadTimeMillis = SystemClock.currentThreadTimeMillis();
        mStartedPrinting = true;
    } else {
        final long endTime = System.currentTimeMillis();
        mStartedPrinting = false;
        if (isBlock(endTime)) {
            notifyBlockEvent(endTime);
        }
    }
}

private boolean isBlock(long endTime) {
    return endTime - mStartTimeMillis > mBlockThresholdMillis;
}
...

四、源码解读

 BlockCanary.install(this, new AppBlockContext()).start();

首先我们看看他的入口，install这个方法：

 /**
     * Install {@link BlockCanary}
     *
     * @param context            Application context
     * @param blockCanaryContext BlockCanary context
     * @return {@link BlockCanary}
     */
    public static BlockCanary install(Context context, BlockCanaryContext blockCanaryContext) {
        BlockCanaryContext.init(context, blockCanaryContext);
        setEnabled(context, DisplayActivity.class, BlockCanaryContext.get().displayNotification());
        return get();
    }

这里调用三行代码：

调用init()方法, 记录Application和BlockCanaryContext, 为后面的处理提供上下文Context和配置参数(例如: 卡顿阈值,是否显示通知等等...)
调用setEnabled()方法, 判断桌面是否显示黄色的logo图标
调用get()方法, 创建BlockCanary的实例,并且创建BlockCanaryInternals实例, 赋值给mBlockCanaryCore属性, 用来处理后面的流程

static void init(Context context, BlockCanaryContext blockCanaryContext) {
        sApplicationContext = context;
        sInstance = blockCanaryContext;
    }

这个init方法就做了一个赋值的操作，将我们传递过来的context进行赋值。

我们继续看BlockCanary.start()做了什么事：

public void start() {
    if (!mMonitorStarted) {
        mMonitorStarted = true;
        Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
    }
}

start()方法只做了一件事: 给Looper设置一个Printer

那么当Looper处理消息的前后, 就会调用mBlockCanaryCore.monitor的println()方法。

mBlockCanaryCore.monitor是BlockCanaryInternals的成员属性LooperMonitor

class LooperMonitor implements Printer {
    ...
    @Override
    public void println(String x) {
        //如果StopWhenDebugging, 就不检测
        if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
            return;
        }
        if (!mPrintingStarted) {
            mStartTimestamp = System.currentTimeMillis();
            mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
            mPrintingStarted = true;
            startDump();  //在子线程中获取调用栈和CPU信息
        } else {
            final long endTime = System.currentTimeMillis();
            mPrintingStarted = false;
            if (isBlock(endTime)) {  //判断是否超过设置的阈值
                notifyBlockEvent(endTime);
            }
            stopDump(); //停止获取调用栈和CPU信息
        }
    }
    //判断是否超过设置的阈值
    private boolean isBlock(long endTime) {
        return endTime - mStartTimestamp > mBlockThresholdMillis;
    }
    ...
}

LooperMonitor的println()就是最核心的地方, 实现代码也很简单：

Looper处理消息前, 获取当前时间并且保存, 调用startDump()启动一个任务定时去采集调用栈/CPU 等等信息
Looper处理消息完成, 获取当前时间, 判断是否超过我们自定义的阈值isBlock(endTime)如果超过了, 就调用notifyBlockEvent(endTime)来通知处理后面的流程
调用stopDump()停止获取调用栈以及CPU的任务

startDump采集的信息包括:

基本信息：机型, CPU内核数, 进程名, 内存, 版本号等等
耗时信息：实际耗时, 主线程时钟耗时, 卡顿开始时间和结束时间
CPU信息：时间段内CPU是否忙, 时间段内的系统CPU/应用CPU占比, I/O占- - CPU使用率
堆栈信息：发生卡顿前的最近堆栈

五、总结

blockcanary完美利用了安卓上的消息机制，给Looper设置一个Printer，通过记录堆栈和CPU信息，计算主线程处理消息的时间，如果超过了阈值，就检索此时的堆栈和cpu信息来帮助分析卡顿原因。

BlockCanary — 轻松找出Android App界面卡顿元凶
 GitHub：BlockCanary