Android WatchDog原理分析
2020-01-08 本文已影响0人
锄禾豆
简述
了解WatchDog的原理,可以更好的理解系统服务的运行机制
分析
1.Watchdog extends Thread
Watchdog是一个线程
2.在SystemServer.java中启动
private void startOtherServices() {
······
traceBeginAndSlog("InitWatchdog");
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
traceEnd();
······
traceBeginAndSlog("StartWatchdog");
Watchdog.getInstance().start();
traceEnd();
}
因为是线程,所以,只要start即可
3.查看WatchDog的构造方法
private Watchdog() {
super("watchdog");
// Initialize handler checkers for each common thread we want to check. Note
// that we are not currently checking the background thread, since it can
// potentially hold longer running operations with no guarantees about the timeliness
// of operations there.
// The shared foreground thread is the main checker. It is where we
// will also dispatch monitor checks and do other work.
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// Add checker for main thread. We only do a quick check since there
// can be UI running on the thread.
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// Add checker for shared UI thread.
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// And also check IO thread.
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// And the display thread.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
// Initialize monitor for Binder threads.
addMonitor(new BinderThreadMonitor());
mOpenFdMonitor = OpenFdMonitor.create();
// See the notes on DEFAULT_TIMEOUT.
assert DB ||
DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
// mtk enhance
exceptionHWT = new ExceptionLog();
}
1.重点关注两个对象:mMonitorChecker和mHandlerCheckers
2.mHandlerCheckers列表元素的来源:
1)构造对象的导入:UiThread、IoThread、DisplatyThread、FgThread加入
2)外部导入:Watchdog.getInstance().addThread(handler);
3.mMonitorChecker列表元素的来源:
外部导入:Watchdog.getInstance().addMonitor(monitor);
特别说明:addMonitor(new BinderThreadMonitor());
4.查看WatchDog的run方法
public void run() {
boolean waitedHalf = false;
boolean mSFHang = false;
while (true) {
······
synchronized (this) {
······
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
······
}
······
}
对mHandlerCheckers列表元素进行检测
5.查看HandlerChecker的scheduleCheckLocked
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
// If the target looper has recently been polling, then
// there is no reason to enqueue our checker on it since that
// is as good as it not being deadlocked. This avoid having
// to do a context switch to check the thread. Note that we
// only do this if mCheckReboot is false and we have no
// monitors, since those would need to be executed at this point.
mCompleted = true;
return;
}
if (!mCompleted) {
// we already have a check in flight, so no need
return;
}
mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}
1.mMonitors.size() == 0的情況,
主要为了检查mHandlerCheckers中的元素是否超时,运用的手段:mHandler.getLooper().getQueue().isPolling()
2.mMonitorChecker对象的列表元素一定是大于0,此时,关注点在mHandler.postAtFrontOfQueue(this):
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
运用的手段:监听monitor方法
1)这里是对mMonitors进行monitor,而能够满足条件的只有:mMonitorChecker,例如:各种服务通过addMonitor加入列表
ActivityManagerService.java
Watchdog.getInstance().addMonitor(this);
InputManagerService.java
Watchdog.getInstance().addMonitor(this);
PowerManagerService.java
Watchdog.getInstance().addMonitor(this);
ActivityManagerService.java
Watchdog.getInstance().addMonitor(this);
WindowManagerService.java
Watchdog.getInstance().addMonitor(this);
而被执行的monitor方法很简单,例如ActivityManagerService:
public void monitor() {
synchronized (this) { }
}
这里仅仅是检查系统服务是否被锁住。
2)特别说明,怎样检查BinderThreadMonitor?
Watchdog的内部类
private static final class BinderThreadMonitor implements Watchdog.Monitor {
@Override
public void monitor() {
Binder.blockUntilThreadAvailable();
}
}
android.os.Binder.java
public static final native void blockUntilThreadAvailable();
android_util_Binder.cpp
static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
{
return IPCThreadState::self()->blockUntilThreadAvailable();
}
IPCThreadState.cpp
void IPCThreadState::blockUntilThreadAvailable()
{
pthread_mutex_lock(&mProcess->mThreadCountLock);
while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
static_cast<unsigned long>(mProcess->mMaxThreads));
pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
}
pthread_mutex_unlock(&mProcess->mThreadCountLock);
}
这里仅仅是检查进程中包含的可执行线程的数量不能超过mMaxThreads,如果超过了最大值(31个),就需要等待。
原因:
ProcessState.cpp
#define DEFAULT_MAX_BINDER_THREADS 15
但是systemserver.java进行了设置
// maximum number of binder threads used for system_server
// will be higher than the system default
private static final int sMaxBinderThreads = 31;
private void run() {
······
BinderInternal.setMaxThreads(sMaxBinderThreads);
······
}
6.发生超时后,WatchDog会做什么?
public void run() {
······
Process.killProcess(Process.myPid());
System.exit(10);
······
}
kill自己所在进程(system_server),并退出。
7.问题
1).WatchDog会打印什么日志?
(1)process stack traces
保存路径由dalvik.vm.stack-trace-file或dalvik.vm.stack-trace-dir控制,常规为/data/anr/ ActivityManagerService.dumpStackTraces(true, pids, null, null, getInterestingNativePids());
注意点: 1.堵塞一半时即WAITED_HALF,也会打印process stack traces
(2)slog
sys log ---> android.util.Slog (hide类)
Slog.e(TAG, "**SWT happen **" + subject);
Slog.v(TAG, "** save all info before killnig system server **");
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
Slog.w(TAG, "*** GOODBYE!");
(3)event log
EventLog.writeEvent(EventLogTags.WATCHDOG, name.isEmpty() ? subject : name);
(4)kernel stack traces
保存路径由dalvik.vm.stack-trace-file控制,常规为/data/anr/
if (RECORD_KERNEL_THREADS) {
dumpKernelStackTraces();
}
private File dumpKernelStackTraces() {
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
if (tracesPath == null || tracesPath.length() == 0) {
return null;
}
native_dumpKernelStacks(tracesPath);
return new File(tracesPath);
}
(5)dropbox
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
Slog.v(TAG, "** start addErrorToDropBox **");
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
name.isEmpty() ? subject : name, null, stack, null);
}
};
dropboxThread.start();
注意:
dropbox一般放在/data/system/dropbox目录下,具体原因如下:
DropBoxManagerService.java
public DropBoxManagerService(final Context context) {
this(context, new File("/data/system/dropbox"), FgThread.get().getLooper());
}
2.为什么要监测UiThread、IoThread、DisplatyThread、FgThread?
首先,这4个类,继承ServiceThread,是单例模式。例如UiThread.java
/**
* Shared singleton thread for showing UI. This is a foreground thread, and in
* additional should not have operations that can take more than a few ms scheduled
* on it to avoid UI jank.
*/
public final class UiThread extends ServiceThread {
private static final long SLOW_DISPATCH_THRESHOLD_MS = 100;
private static UiThread sInstance;
private static Handler sHandler;
private UiThread() {
super("android.ui", Process.THREAD_PRIORITY_FOREGROUND, false /*allowIo*/);
}
@Override
public void run() {
// Make sure UiThread is in the fg stune boost group
Process.setThreadGroup(Process.myTid(), Process.THREAD_GROUP_TOP_APP);
super.run();
}
private static void ensureThreadLocked() {
if (sInstance == null) {
sInstance = new UiThread();
sInstance.start();
final Looper looper = sInstance.getLooper();
looper.setTraceTag(Trace.TRACE_TAG_ACTIVITY_MANAGER);
looper.setSlowDispatchThresholdMs(SLOW_DISPATCH_THRESHOLD_MS);
sHandler = new Handler(sInstance.getLooper());
}
}
public static UiThread get() {
synchronized (UiThread.class) {
ensureThreadLocked();
return sInstance;
}
}
public static Handler getHandler() {
synchronized (UiThread.class) {
ensureThreadLocked();
return sHandler;
}
}
}
1.通过get()获取对象
2.通过getHandler()获取各自线程里面的Handler对象
3.注意看,创建自身对象ensureThreadLocked的时候,就进行了start动作。也就是说,这个线程
在创建对象的时候就,就已经启动了。
其次,这四个类都继承ServiceThread ,而ServiceThread继承HandlerThread。我们重点关注线程中的Handler,因为ActivityManagerService、WMS、PMS等系统服务都涉及调用它们。
final class UiHandler extends Handler {
public UiHandler() {
super(com.android.server.UiThread.get().getLooper(), null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case SHOW_ERROR_UI_MSG: {
mAppErrors.handleShowAppErrorUi(msg);
ensureBootCompleted();
} break;
······
}
}
1.UiHandler是直接获取的UiThread里面的Looper。我们清楚一个线程一个Looper,一个MessageQueue,但是可以有多个Handler.
2.我们看handleMessage里面的处理方式,说明并不一定是主线程才能更新Ui。
最后,UIThread、IoThread、DisplatyThread、FgThread之间有什么区别?
a.线程名称不一样:
分别对应名称为android.ui、android.io、android.display、android.fg
b.线程等级有差异
UiThread-->Process.THREAD_PRIORITY_FOREGROUND
IoThread、FgThread-->android.os.Process.THREAD_PRIORITY_DEFAULT
DisplatyThread-->Process.THREAD_PRIORITY_DISPLAY + 1
c.使用的场景略有差异
UiThread --> ActivityManagerService
DisplayThread --> WindowManagerService、InputManagerService、DisplayMangerService
IoThread -->
PackageInstallerService、StorageManagerService、BluetoothManagerService
8.总结
1.Watchdog的核心对象为mHandlerCheckers和mMonitorChecker。
mHandlerCheckers:监控消息队列是否发生阻塞
mMonitorChecker:监控系统核心服务是否发生长时间持锁。
2.mHandlerCheckers的对象采用手段为通过mHandler.getLooper().getQueue().isPolling()判断是否超时;mMonitorChecker通过synchronized(this)判断是否超时,其中特别注意,BinderThreadMonitor主要是通过判断Binder线程是否超过了系统最大值来判断是否超时。
3.超时之后,系统会打印一系列的日志,可以根据各种日志输出,进行有效分析
4. 超时之后,Watchdog会杀掉自己的进程,也就是此时system_server进程id会变化
5.拓展:是否我们可以采用此方式来监听我们app是否也发生相关问题?