类的加载原理(上)

2021-08-12  本文已影响0人  浅墨入画

objc_init分析

首先查看libObjc中的_objc_init源码

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    //读取影响运行时的环境变量,如果需要,还可以打开环境变量帮助 export OBJC_HELP = 1
    environ_init();
    //关于线程key的绑定,例如线程数据的析构函数
    tls_init();
    //运行C++静态构造函数,在dyld调用我们的静态析构函数之前,libc会调用_objc_init(),因此我们必须自己做
    static_init();
    //runtime运行时环境初始化,里面主要是unattachedCategories、allocatedClasses -- 分类初始化
    runtime_init();
    //初始化libobjc的异常处理系统
    exception_init();
#if __OBJC2__
    //缓存条件初始化
    cache_t::init();
#endif
    //启动回调机制,通常这不会做什么,因为所有的初始化都是惰性的,但是对于某些进程,我们会迫不及待地加载trampolines dylib
    _imp_implementationWithBlock_init();

    /*
     _dyld_objc_notify_register -- dyld 注册的地方
     - 仅供objc运行时使用
     - 注册处理程序,以便在映射、取消映射 和初始化objc镜像文件时使用,dyld将使用包含objc_image_info的镜像文件数组,回调 mapped 函数
     
     map_images:dyld将image镜像文件加载进内存时,会触发该函数
     load_images:dyld初始化image会触发该函数
     unmap_image:dyld将image移除时会触发该函数
     */
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);
    // map_images()
    // load_images()
#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}

objc_init

environ_init方法源码:环境变量初始化
void environ_init(void) 
{
    //...省略部分逻辑
if (PrintHelp  ||  PrintOptions) {
        //...省略部分逻辑
        for (size_t i = 0; i < sizeof(Settings)/sizeof(Settings[0]); i++) {
            const option_t *opt = &Settings[i];            
            if (PrintHelp) _objc_inform("%s: %s", opt->env, opt->help);
            if (PrintOptions && *opt->var) _objc_inform("%s is set", opt->env);
        }
    }
}

有以下两种方式打印所有的环境变量

image.png
~ export OBJC_hrlp = 1
objc[21966]: OBJC_PRINT_IMAGES: log image and library names as they are loaded
objc[21966]: OBJC_PRINT_IMAGES is set
objc[21966]: OBJC_PRINT_IMAGE_TIMES: measure duration of image loading steps
objc[21966]: OBJC_PRINT_IMAGE_TIMES is set
objc[21966]: OBJC_PRINT_LOAD_METHODS: log calls to class and category +load methods
objc[21966]: OBJC_PRINT_LOAD_METHODS is set
objc[21966]: OBJC_PRINT_INITIALIZE_METHODS: log calls to class +initialize methods
objc[21966]: OBJC_PRINT_INITIALIZE_METHODS is set
objc[21966]: OBJC_PRINT_RESOLVED_METHODS: log methods created by +resolveClassMethod: and +resolveInstanceMethod:
objc[21966]: OBJC_PRINT_RESOLVED_METHODS is set
objc[21966]: OBJC_PRINT_CLASS_SETUP: log progress of class and category setup
objc[21966]: OBJC_PRINT_CLASS_SETUP is set
objc[21966]: OBJC_PRINT_PROTOCOL_SETUP: log progress of protocol setup
objc[21966]: OBJC_PRINT_PROTOCOL_SETUP is set
objc[21966]: OBJC_PRINT_IVAR_SETUP: log processing of non-fragile ivars
objc[21966]: OBJC_PRINT_IVAR_SETUP is set
objc[21966]: OBJC_PRINT_VTABLE_SETUP: log processing of class vtables
objc[21966]: OBJC_PRINT_VTABLE_SETUP is set
...

这些环境变量均可以通过target -- Edit Scheme -- Run --Arguments -- Environment Variables进行配置

环境变量 - OBJC_DISABLE_NONPOINTER_ISA

OBJC_DISABLE_NONPOINTER_ISA为例,将其设置为YES,如下图所示

image.png
(lldb) x/4gx p
0x10120d7d0: 0x011d8001000082d1 0x0000000000000000
0x10120d7e0: 0x0000000000000000 0x0000000000000000
(lldb) p/t 0x011d8001000082d1
(long) $1 = 0b0000000100011101100000000000000100000000000000001000001011010001
(lldb) x/4gx p
0x1007057f0: 0x00000001000082d0 0x0000000000000000
0x100705800: 0x0000000000000000 0x0000000000000000
(lldb) p/t 0x00000001000082d0
(long) $1 = 0b0000000000000000000000000000000100000000000000001000001011010000

OBJC_DISABLE_NONPOINTER_ISA可以控制isa优化开关,从而优化整个内存结构

环境变量 - OBJC_PRINT_LOAD_METHODS
image.png

OBJC_PRINT_LOAD_METHODS可以监控所有的+load方法,从而处理启动优化

tls_init:线程key的绑定

主要是本地线程池的初始化以及析构,源码如下

void tls_init(void)
{
#if SUPPORT_DIRECT_THREAD_KEYS//本地线程池,用来进行处理
    pthread_key_init_np(TLS_DIRECT_KEY, &_objc_pthread_destroyspecific);//初始init
#else
    _objc_pthread_key = tls_create(&_objc_pthread_destroyspecific);//析构
#endif
}
static_init:运行系统级别的C++静态构造函数

主要是运行系统级别的C++静态构造函数,在dyld调用我们的静态构造函数之前,libc调用_objc_init方法,即系统级别的C++构造函数先于自定义的C++构造函数运行

runtime_init:运行时环境初始化

主要是运行时的初始化,主要分为两部分:分类初始化类的表初始化

exception_init:初始化libobjc的异常处理系统

主要是初始化libobjc的异常处理系统,注册异常处理的回调,从而监控异常的处理,源码如下

void exception_init(void)
{
    old_terminate = std::set_terminate(&_objc_terminate);
}
/***********************************************************************
* _objc_terminate
* Custom std::terminate handler.
*
* The uncaught exception callback is implemented as a std::terminate handler. 
* 1. Check if there's an active exception
* 2. If so, check if it's an Objective-C exception
* 3. If so, call our registered callback with the object.
* 4. Finally, call the previous terminate handler.
**********************************************************************/
static void (*old_terminate)(void) = nil;
static void _objc_terminate(void)
{
    if (PrintExceptions) {
        _objc_inform("EXCEPTIONS: terminating");
    }

    if (! __cxa_current_exception_type()) {
        // No current exception.
        (*old_terminate)();
    }
    else {
        // There is a current exception. Check if it's an objc exception.
        @try {
            __cxa_rethrow();
        } @catch (id e) {
            // It's an objc object. Call Foundation's handler, if any.
            (*uncaught_handler)((id)e);
            (*old_terminate)();
        } @catch (...) {
            // It's not an objc object. Continue to C++ terminate.
            (*old_terminate)();
        }
    }
}
/***********************************************************************
* objc_setUncaughtExceptionHandler
* Set a handler for uncaught Objective-C exceptions. 
* Returns the previous handler. 
**********************************************************************/
objc_uncaught_exception_handler 
objc_setUncaughtExceptionHandler(objc_uncaught_exception_handler fn)
{
    objc_uncaught_exception_handler result = uncaught_handler;
    uncaught_handler = fn;
    return result;
}
crash分类

crash的主要原因是收到了未处理的信号,主要来源于三个地方:

所以相对应的,crash也分为了3种

针对应用级异常,可以通过注册异常捕获的函数,即NSSetUncaughtExceptionHandler机制实现线程保活, 收集上传崩溃日志

cache_init:缓存初始化

主要是缓存初始化源码如下

void cache_t::init()
{
#if HAVE_TASK_RESTARTABLE_RANGES
    mach_msg_type_number_t count = 0;
    kern_return_t kr;

    while (objc_restartableRanges[count].location) {
        count++;
    }
    //为当前任务注册一组可重新启动的缓存
    kr = task_restartable_ranges_register(mach_task_self(),
                                          objc_restartableRanges, count);
    if (kr == KERN_SUCCESS) return;
    _objc_fatal("task_restartable_ranges_register failed (result 0x%x: %s)",
                kr, mach_error_string(kr));
#endif // HAVE_TASK_RESTARTABLE_RANGES
}
_imp_implementationWithBlock_init:启动回调机制

该方法主要是启动回调机制,通常这不会做什么,因为所有的初始化都是惰性的,但是对于某些进程,我们会迫不及待地加载libobjc-trampolines.dylib其源码如下

/// Initialize the trampoline machinery. Normally this does nothing, as
/// everything is initialized lazily, but for certain processes we eagerly load
/// the trampolines dylib.
void
_imp_implementationWithBlock_init(void)
{
#if TARGET_OS_OSX
    // Eagerly load libobjc-trampolines.dylib in certain processes. Some
    // programs (most notably QtWebEngineProcess used by older versions of
    // embedded Chromium) enable a highly restrictive sandbox profile which
    // blocks access to that dylib. If anything calls
    // imp_implementationWithBlock (as AppKit has started doing) then we'll
    // crash trying to load it. Loading it here sets it up before the sandbox
    // profile is enabled and blocks it.
    // 在某些进程中渴望加载libobjc-trampolines.dylib。一些程序(最著名的是嵌入式Chromium的较早版本使用的QtWebEngineProcess)启用了严格限制的沙箱配置文件,从而阻止了对该dylib的访问。如果有任何调用imp_implementationWithBlock的操作(如AppKit开始执行的操作),那么我们将在尝试加载它时崩溃。将其加载到此处可在启用沙箱配置文件之前对其进行设置并阻止它。
    // This fixes EA Origin (rdar://problem/50813789)
    // and Steam (rdar://problem/55286131)
    if (__progname &&
        (strcmp(__progname, "QtWebEngineProcess") == 0 ||
         strcmp(__progname, "Steam Helper") == 0)) {
        Trampolines.Initialize();
    }
#endif
}
_dyld_objc_notify_register:dyld注册

这个方法的具体实现在应用程序加载篇章有说明,其源码实现是在dyld源码中,以下是_dyld_objc_notify_register方法的声明

// 注意map_images是指针传递
_dyld_objc_notify_register(&map_images, load_images, unmap_image);

void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped);

方法中的三个参数含义如下:

dyld与Objc的关联

其方法的源码实现与调用如下,即dyld与Objc的关联可以通过源码体现

===> dyld源码--具体实现
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}

===> libobjc源码中--调用
_dyld_objc_notify_register(&map_images, load_images, unmap_image);

从上可以得出

map_images与load_images

其中代码通过编译,读取到Mach-O可执行文件中,再从Mach-O中读取到内存,如下图

数据加载流程

read_images流程引入

上面方法_dyld_objc_notify_register中为什么map_images前面有&,而load_images没有?

map_images方法的主要作用是将Mach-O中的类信息加载到内存,map_images源码如下

/***********************************************************************
* map_images
* Process the given images which are being mapped in by dyld.
* Calls ABI-agnostic code after taking ABI-specific locks.
*
* Locking: write-locks runtimeLock
**********************************************************************/
void
map_images(unsigned count, const char * const paths[],
           const struct mach_header * const mhdrs[])
{
    mutex_locker_t lock(runtimeLock);
    return map_images_nolock(count, paths, mhdrs);
}

进入map_images_nolock源码,其关键代码是``

void 
map_images_nolock(unsigned mhCount, const char * const mhPaths[],
                  const struct mach_header * const mhdrs[])
{
    ......//省略代码
    // Find all images with Objective-C metadata.
    hCount = 0;

    // Count classes. Size various table based on the total.
    int totalClasses = 0;
    int unoptimizedTotalClasses = 0;
    {
     ......//省略
    }
#else
                _getObjcSelectorRefs(hi, &selrefCount);
#endif
    ......//省略
    if (hCount > 0) {
        //加载镜像文件
        _read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);
    }

    firstTime = NO;
    
    // Call image load funcs after everything is set up.
    for (auto func : loadImageFuncs) {
        for (uint32_t i = 0; i < mhCount; i++) {
            func(mhdrs[i]);
        }
    }
}

read_images主体流程

read_images主要是加载类信息即类、分类、协议等,其源码主要分为以下步骤

主要流程分析
    if (!doneOnce) {
        doneOnce = YES;
        launchTime = YES;
        //......省略
        int namedClassesSize = 
            (isPreoptimized() ? unoptimizedTotalClasses : totalClasses) * 4 / 3;
        gdb_objc_realized_classes =
            NXCreateMapTable(NXStrValueMapPrototype, namedClassesSize);

        ts.log("IMAGE TIMES: first time tasks");
    }

这里会创建一个哈希表gdb_objc_realized_classes,将所有的类放入这个表中,目的是方便快捷查找到类。这个哈希表用于存储不在共享缓存且已命名类,无论是否实现,其容量是类数量的4/3

    // sel 名字 + 地址 带地址的字符串
    // 修复预编译阶段@selector的混乱问题
    static size_t UnfixedSelectors;
    {
        mutex_locker_t lock(selLock);
        for (EACH_HEADER) {
            if (hi->hasPreoptimizedSelectors()) continue;

            bool isBundle = hi->isBundle();
            // 通过_getObjc2SelectorRefs拿到Mach-O的静态段__objc_selrefs
            SEL *sels = _getObjc2SelectorRefs(hi, &count);
            UnfixedSelectors += count;
            for (i = 0; i < count; i++) {
                const char *name = sel_cname(sels[i]);
                SEL sel = sel_registerNameNoLock(name, isBundle);
                if (sels[i] != sel) {
                    sels[i] = sel;
                }
            }
        }
    }
    ts.log("IMAGE TIMES: fix up selector references");

// sel[i] = sel; 打断点调试信息如下
(lldb) p/x sel
(SEL) $0 = 0x00000001f134a483 "retain"
(lldb) p/x sels[i]
(SEL) $1 = 0x00000001004c0c7f "retain"

这里的sel是一个带有地址的字符串,sel名字可能相同,但是地址会出现不同,这个时候需要统一进行修复

为什么相同的类,方法名相同,但是方法的地址不同,按理说,方法名与方法的地址应该都相同?
答案: 因为我么整个系统中会存在多个库,例如:Foundation 、CoreFoundation等,每个框架中的每个类基本都会存在retain方法,当执行该方法时,需要将方法平移到程序出口的位置执行,Foundation框架中的ratain方法,位置为0CoreFoundation位置则为CoreFoundation + 0的大小,因此方法地址的不同,方法需要平移调整

// Discover classes. Fix up unresolved future classes. Mark bundle classes.
    bool hasDyldRoots = dyld_shared_cache_some_image_overridden();
    // 读取类:readClass
    for (EACH_HEADER) {
        if (! mustReadClasses(hi, hasDyldRoots)) {
            // Image is sufficiently optimized that we need not call readClass()
            continue;
        }
        // 从编译后的类列表中取出所有类,即从Mach-O中获取静态段__objc_classlist,是一个classref_t类型的指针
        classref_t const *classlist = _getObjc2ClassList(hi, &count);

        bool headerIsBundle = hi->isBundle();
        bool headerIsPreoptimized = hi->hasPreoptimizedClasses();

        for (i = 0; i < count; i++) {
            // 此时获取的cls只是一个地址
            Class cls = (Class)classlist[i];
            // 读取类,经过这步后cls获取的值才是一个名字
            Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);

            if (newCls != cls  &&  newCls) {
                // Class was moved but not deleted. Currently this occurs 
                // only when the new class resolved a future class.
                // Non-lazily realize the class below.
                // 将懒加载的类添加到数组中
                resolvedFutureClasses = (Class *)
                    realloc(resolvedFutureClasses, 
                            (resolvedFutureClassCount+1) * sizeof(Class));
                resolvedFutureClasses[resolvedFutureClassCount++] = newCls;
            }
        }
    }

    ts.log("IMAGE TIMES: discover classes");

Class cls = (Class)classlist[i];添加断点调试信息如下图

调试cls

未执行readClass方法前,cls只是一个地址。执行后cls是一个类的名称。所以到这步为止,类的信息目前仅存储了地址 + 名称

readClass分析

由上面可知cls被赋予类的名称是通过readClass方法,接下来我们看下readClass方法,其中关键代码是addNamedClassaddClassTableEntry

/***********************************************************************
* readClass
* Read a class and metaclass as written by a compiler.
* Returns the new class pointer. This could be: 
* - cls
* - nil  (cls has a missing weak-linked superclass)
* - something else (space for this class was reserved by a future class)
*
* Note that all work performed by this function is preflighted by 
* mustReadClasses(). Do not change this function without updating that one.
*
* Locking: runtimeLock acquired by map_images or objc_readClassPair
**********************************************************************/
Class readClass(Class cls, bool headerIsBundle, bool headerIsPreoptimized)
{
    // 如果想进入自定义类,自己加一个判断
    const char *mangledName = cls->nonlazyMangledName();
    const char *LGPersonName = "LGPerson";
    if (strcmp(mangledName, LGPersonName) == 0) { //判断类名是否匹配
        // 普通写得类 他是如何
        printf("%s -KC: 要研究的: - %s\n",__func__,mangledName);
    }
    
    if (missingWeakSuperclass(cls)) {
        // No superclass (probably weak-linked). 
        // Disavow any knowledge of this subclass.
        if (PrintConnecting) {
            _objc_inform("CLASS: IGNORING class '%s' with "
                         "missing weak-linked superclass", 
                         cls->nameForLogging());
        }
        addRemappedClass(cls, nil);
        cls->setSuperclass(nil);
        return nil;
    }
    
    cls->fixupBackwardDeployingStableSwift();

    Class replacing = nil;
    if (mangledName != nullptr) {
        if (Class newCls = popFutureNamedClass(mangledName)) {
            // This name was previously allocated as a future class.
            // Copy objc_class to future class's struct.
            // Preserve future's rw data block.

            if (newCls->isAnySwift()) {
                _objc_fatal("Can't complete future class request for '%s' "
                            "because the real class is too big.",
                            cls->nameForLogging());
            }

            class_rw_t *rw = newCls->data();
            const class_ro_t *old_ro = rw->ro();
            memcpy(newCls, cls, sizeof(objc_class));

            // Manually set address-discriminated ptrauthed fields
            // so that newCls gets the correct signatures.
            newCls->setSuperclass(cls->getSuperclass());
            newCls->initIsa(cls->getIsa());

            rw->set_ro((class_ro_t *)newCls->data());
            newCls->setData(rw);
            freeIfMutable((char *)old_ro->getName());
            free((void *)old_ro);

            addRemappedClass(cls, newCls);

            replacing = cls;
            cls = newCls;
        }
    }
    
    if (headerIsPreoptimized  &&  !replacing) {
        // class list built in shared cache
        // fixme strict assert doesn't work because of duplicates
        // ASSERT(cls == getClass(name));
        ASSERT(mangledName == nullptr || getClassExceptSomeSwift(mangledName));
    } else {
        if (mangledName) { //some Swift generic classes can lazily generate their names
            addNamedClass(cls, mangledName, replacing);
        } else {
            Class meta = cls->ISA();
            const class_ro_t *metaRO = meta->bits.safe_ro();
            ASSERT(metaRO->getNonMetaclass() && "Metaclass with lazy name must have a pointer to the corresponding nonmetaclass.");
            ASSERT(metaRO->getNonMetaclass() == cls && "Metaclass nonmetaclass pointer must equal the original class.");
        }
        addClassTableEntry(cls);
    }

    // for future reference: shared cache never contains MH_BUNDLEs
    if (headerIsBundle) {
        cls->data()->flags |= RO_FROM_BUNDLE;
        cls->ISA()->data()->flags |= RO_FROM_BUNDLE;
    }
    
    return cls;
}
    const char *nonlazyMangledName() const {
        return bits.safe_ro()->getName();
    }

    const class_ro_t *safe_ro() const {
        class_rw_t *maybe_rw = data();
        if (maybe_rw->flags & RW_REALIZED) {
            // maybe_rw 是 rw
            return maybe_rw->ro();
        } else {
            // maybe_rw 实际上时ro
            return (class_ro_t *)maybe_rw;
        }
    }

这里获取非懒加载的类名,如果rw中存在这从rw中取,反之从ro中获取

小结

综上所述,readClass的主要作用就是将Mach-O中的类读取到内存即插入表中,但是目前的类仅有两个信息地址和名称,而Mach-O中的data数据还未读取出来。

类的加载处理
    // +load handled by prepare_load_methods()

    // Realize non-lazy classes (for +load methods and static instances)
    for (EACH_HEADER) {
        classref_t const *classlist = hi->nlclslist(&count);
        for (i = 0; i < count; i++) {
            Class cls = remapClass(classlist[i]);
            if (!cls) continue;

            addClassTableEntry(cls);

            if (cls->isSwiftStable()) {
                if (cls->swiftMetadataInitializer()) {
                    _objc_fatal("Swift class %s with a metadata initializer "
                                "is not allowed to be non-lazy",
                                cls->nameForLogging());
                }
                // fixme also disallow relocatable classes
                // We can't disallow all Swift classes because of
                // classes like Swift.__EmptyArrayStorage
            }
            
            const char *mangledName = cls->nonlazyMangledName();
            if (strcmp(mangledName, "LGPerson") == 0)
            {
                printf("%s LGPerson....\n",__func__);
            }
            
            realizeClassWithoutSwift(cls, nil);
        }
    }

    ts.log("IMAGE TIMES: realize non-lazy classes");

主要是实现类的加载处理,实现非懒加载(当我们类实现了load方法是,会进入该方法)

realizeClassWithoutSwift方法的源码实现我们下一篇再探讨。

上一篇下一篇

猜你喜欢

热点阅读