应用程序加载

2021-07-27  本文已影响0人  Wayne_Wang
1.jpeg

做了这么久的ioser,你真的了解我们应用程序加载的一个主流程么?我们做的app是怎么运行起来的呢?下面我们探索下在我们看不到的地方底层加载流程。

前言

我们知道我们的app是我们用代码一行一行敲出来的,我们这些代码自己是没办法跑起来的,只有把他们加载到内存,然后他们在内存中被某种机制调用正式启动进入到main函数。这个整个过程会经历很多个环节,会涉及到很多的库的调用和加载。首先要把我们写的代码和一些需要用到的库都编译成可执行文件。然后才是这个可执行文件的加载启动。如下图:

编译过程:
编译过程.png

可执行文件(exec)你双击或者直接拖到终端就可以运行起来。ps:iphone模拟器或者真机编译的需要处理一些签名等问题所以可以直接用mac环境的项目来尝试

补充动静态库的区别:
动静态库链接.png

由上面动静态库链接图可见 静态库链接是把自己复制一份然后链接其他的库,这样就会出现同一个库重复出现的情况。而动态库则不会,动态库链接是一个动态库直接链接多个其他库而不会出现复制自己的情况。所以大部分苹果的系统库都是动态库有利于优化内存空间。

应用程序加载原理

在前言中我们知道如何把我们的代码和一些库怎么编译成为了可执行文件,但是这个执行文件是怎么加载到内存中的呢?前面我们探索了objc_init的方法,也去读了部分objc源码,那它们是如何调用和启动的呢?下面我们就探索下这个流程。

思考:既然我们是要探索app启动之前做了什么,我们暂时也不知道从哪儿可以入手去查找这个流程,那么我们是否可以利用倒推法,从app启动进入mian函数那时刻起 反着查看汇编或者堆栈看是否能找到一些线索呢?

下面我们就这样去操作一下。首先创建一个app工程(ZYProjectTwelfth001),然后在viewController里添加一个+ (void)load;方法。因为我们前面的文章就讲到了load方法在main函数之前。所以这也可以帮助我们判断在main函数前都做了什么。然后断点到main函数然后查看下汇编。

2.png 3.png

debug->Always show Disassembly查看汇编

4.png

既然我们在这个堆栈里可以发现在main函数前调用了start函数并且发现它是在dyld里调用的,那我们可以尝试符号断点这个函数。然而我断点过了 并不能断住。所以我猜测这个函数在底层并不叫start函数或者有其他原因导致他不能别断点。这条路行不通那我们就利用在mian 函数前的load函数做文章。我们打一个断点到load函数。然后利用bt查看堆栈信息。

5.png

从上面的信息我们也可以发现在mian 之前的那个start确实是来自dyld_dyld_start方法。而且在这个方法之后调用了很多其他的方法流程。例如:
initializeMainExecutable()
ImageLoader::runInitializers
ImageLoader::processInitializers
ImageLoader::recursiveInitialization
dyld::notifySingle
libobjc.A.dylib load_images

那我们下面就去dyld的源码探究一番,看看能不能找到这些东西。我这里用的是dyld-852需要的可以点击去下载。

在真正探索dyld之前我先利用前辈的一张图来解释下什么是dyld。我们经常听到dyld确实他就是一个动态编译器。他的作用就是在app启动后加载各种库和镜像文件.

12.png

初步了解了dyld,我们继续上面的查找流程

6.png

全局搜索_dyld_start这个方法(因为c++语法关系我们先搜索前面的dyldbootstrap然后再去搜索start方法,这种叫二级命名空间)如图:

7.png

在这个start方法最后的return,我们看到了我们熟悉的mian 方法(不过这里的mian方法可不是我们自己程序里的那个mian方法哦),我们跟进去看看。

8.png

我的天,这个方法一千多行代码,我只能折叠才能直观点的截图展示出来。下面我们就硬着头皮进去看看吧。还是老方法对于这种代码量大的方法我们直接看return。我们把关于return 结果的result相关的代码贴出来:

uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{

/*
*省略了前面的全部代码
*/
#if TARGET_OS_OSX
        if ( gLinkContext.driverKit ) {
            result = (uintptr_t)sEntryOverride;
            if ( result == 0 )
                halt("no entry point registered");
            *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
        }
        else
#endif
        {
            // find entry point for main executable
            result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
            if ( result != 0 ) {
                // main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
                if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
                    *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
                else
                    halt("libdyld.dylib support not present for LC_MAIN");
            }
            else {
                // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
                result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
                *startGlue = 0;
            }
        }
    }
    catch(const char* message) {
        syncAllImages();
        halt(message);
    }
    catch(...) {
        dyld::log("dyld: launch failed\n");
    }

    CRSetCrashLogMessage("dyld2 mode");
#if !TARGET_OS_SIMULATOR
    if (sLogClosureFailure) {
        // We failed to launch in dyld3, but dyld2 can handle it. synthesize a crash report for analytics
        dyld3::syntheticBacktrace("Could not generate launchClosure, falling back to dyld2", true);
    }
#endif

    if (sSkipMain) {
        notifyMonitoringDyldMain();
        if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
            dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
        }
        ARIADNEDBG_CODE(220, 1);
        result = (uintptr_t)&fake_main;
        *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
    }

    return result;
}

我们代码中发现 return 的这个result的赋值就只有几个地方,而且在一个sMainExecutable这个东西赋值的次数最多,而且取地址fake_main赋值我们进去查看发现是个空函数。所以我们接下来再次用倒推法,我们搜索sMainExecutable看看跟他有关的代码。

uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{
               /*
                *省略了前面的全部代码
                */
        /* ****************** 弱绑定 ********************/
        
        // <rdar://problem/12186933> do weak binding only after all inserted images linked
        sMainExecutable->weakBind(gLinkContext);
        gLinkContext.linkingMainExecutable = false;

        sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);

        CRSetCrashLogMessage("dyld: launch, running initializers");
    #if SUPPORT_OLD_CRT_INITIALIZATION
        // Old way is to run initializers via a callback from crt1.o
        if ( ! gRunInitializersOldWay ) 
            initializeMainExecutable(); 
    #else
        // run all initializers
        initializeMainExecutable(); 
    #endif

        // notify any montoring proccesses that this process is about to enter main()
        notifyMonitoringDyldMain();
        if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
            dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
        }
        ARIADNEDBG_CODE(220, 1);

              /*
                *省略了后面的全部代码 后面的代码就是上方关于返回值result                的代码
               */
}

我们这上面这段代码看到了我们想要的东西。就是我们在load()方法断点的时候查看的堆栈信息里的东西关于images绑定/bind链接/link加载/loadMainExecutable的处理等。这证明我们的方向没有错。我们继续往上溯源查找这个sMainExecutable的相关代码直接从方法开始部分进行:

第一部分:
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{
     /*
     *省略了前面的全部代码
     */
   /* **********************初始化dyld 主程序****************************/

  CRSetCrashLogMessage(sLoadingCrashMessage);
  // instantiate ImageLoader for main executable
  sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
  gLinkContext.mainExecutable = sMainExecutable;
  gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);

  /*
  *省略了后面的全部代码
  */
}

这一部分看到了sMainExecutable的初始化方法instantiateFromLoadedImage,并且在这一段之前的所有代码都是在加载处理各种平台信息架构信息等。可以称之为准备信息阶段吧。这里就不贴出来了,因为太多了。

第二部分:
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{
    /*
     *省略了前面的全部代码
     */

  char dyldPathBuffer[MAXPATHLEN+1];
        
  int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
        
  if ( len > 0 ) {
    dyldPathBuffer[len] = '\0'; // proc_regionfilename() does not zero terminate returned string
    if ( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) != 0 )  
        gProcessInfo->dyldPath = strdup(dyldPathBuffer);
      }

        /* **********************循环加载插入的libraries*************************/
      // load any inserted libraries
      if    ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
            
           for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)   
               loadInsertedDylib(*lib);
             }
          // record count of inserted libraries so that a flat search will look at     
         // inserted libraries, then main, then others.
        sInsertedDylibCount = sAllImages.size()-1;

        // link main executable
        
        gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
          if ( mainExcutableAlreadyRebased ) {
            // previous link() on main executable has already adjusted its internal pointers for ASLR
            // work around that by rebasing by inverse amount
            sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
        }
#endif
        /* **********************链接主程序*************************/
        
        link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
        sMainExecutable->setNeverUnloadRecursive();
        if ( sMainExecutable->forceFlat() ) {
            gLinkContext.bindFlat = true;
            gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
        }

        /* **********************链接 所有 插入的 libraries*************************/
        // link any inserted libraries
        // do this after linking main executable so that any dylibs pulled in by inserted 
        // dylibs (e.g. libSystem) will not be in front of dylibs the program uses
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
                image->setNeverUnloadRecursive();
            }
            if ( gLinkContext.allowInterposing ) {
                // only INSERTED libraries can interpose
                // register interposing info after all inserted libraries are bound so chaining works
                for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                    ImageLoader* image = sAllImages[i+1];
                    image->registerInterposing(gLinkContext);
                }
            }
        }

        if ( gLinkContext.allowInterposing ) {
            // <rdar://problem/19315404> dyld should support interposition even without DYLD_INSERT_LIBRARIES
            for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
                ImageLoader* image = sAllImages[I];
                if ( image->inSharedCache() )
                    continue;
                image->registerInterposing(gLinkContext);
            }
        }
        

  /*
  *省略了后面的全部代码
  */
}

第二部分是插入动态库链接过程,先插入了所有动态库,然后连接了主程序以及所有动态库。

第三部分:
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{

/*
*省略了前面的全部代码
*/

// apply interposing to initial set of images
        for(int i=0; i < sImageRoots.size(); ++i) {
            sImageRoots[i]->applyInterposing(gLinkContext);
        }
        ImageLoader::applyInterposingToDyldCache(gLinkContext);

        /* **********************绑定 通知主程序interposing 已经注册完毕*************************/
        
        // Bind and notify for the main executable now that interposing has been registered
        uint64_t bindMainExecutableStartTime = mach_absolute_time();
        sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
        uint64_t bindMainExecutableEndTime = mach_absolute_time();
        ImageLoaderMachO::fgTotalBindTime += bindMainExecutableEndTime - bindMainExecutableStartTime;
        gLinkContext.notifyBatch(dyld_image_state_bound, false);

        // Bind and notify for the inserted images now interposing has been registered
        if ( sInsertedDylibCount > 0 ) {
            for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
                ImageLoader* image = sAllImages[i+1];
                image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true, nullptr);
            }
        }
        
        /* ****************** 弱绑定 ********************/
        
        // <rdar://problem/12186933> do weak binding only after all inserted images linked
        sMainExecutable->weakBind(gLinkContext);
        gLinkContext.linkingMainExecutable = false;

        sMainExecutable->recursiveMakeDataReadOnly(gLinkContext);

        CRSetCrashLogMessage("dyld: launch, running initializers");
        
/*
* 省略下面代码 就是从 上面绑定开始的代码
*/
        
}

第三部分是做弱绑定以及通知

第四部分:
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{

/*
*省略了前面的全部代码
*/

/* ****************** 不是iphone ********************/
    #if SUPPORT_OLD_CRT_INITIALIZATION
        // Old way is to run initializers via a callback from crt1.o
        if ( ! gRunInitializersOldWay ) 
            initializeMainExecutable(); 
    #else
        
        /* ****************** 是iphone ********************/
        /* ****************** 运行所有的初始化 重点 ********************/
        // run all initializers
        initializeMainExecutable(); 
    #endif

        /* ****************** 通知所有监控进程/dyld 该进程即将进入main() ********************/
        
        // notify any montoring proccesses that this process is about to enter main()
        notifyMonitoringDyldMain();
        
        if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
            dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
        }
        ARIADNEDBG_CODE(220, 1);

#if TARGET_OS_OSX
        if ( gLinkContext.driverKit ) {
            result = (uintptr_t)sEntryOverride;
            if ( result == 0 )
                halt("no entry point registered");
            *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
        }
        else
#endif
        {
            // find entry point for main executable
            result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
            if ( result != 0 ) {
                // main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
                if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
                    *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
                else
                    halt("libdyld.dylib support not present for LC_MAIN");
            }
            else {
                // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
                result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
                *startGlue = 0;
            }
        }
    }
/*
* 省略下面代码 就是从 上面绑定开始的代码
*/
        
}

第四部分是开始运行所有的初始化,并且通知所有的dyld监控进程即将进入到mian函数了。这里也是我们研究的重点,因为从这里initializeMainExecutable(); 开始就是真正的从dyld进入到了objc的入口了。

重点:initializeMainExecutable();

void initializeMainExecutable()
{
    // record that we've reached this step
    gLinkContext.startedInitializingMainExecutable = true;

    
    /* ****************   运行所有镜像文件 runInitializers  *******************/
    
    // run initialzers for any inserted dylibs
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        for(size_t i=1; i < rootCount; ++i) {
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
        }
    }
    
    /* ****************   运行所有主程序初始化 和他所携带的一切 sMainExecutable->runInitializers *******************/
    
    // run initializers for main executable and everything it brings up 
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
    
    // register cxa_atexit() handler to run static terminators in all loaded images when this process exits
    if ( gLibSystemHelpers != NULL ) 
        (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

    // dump info if requested
    if ( sEnv.DYLD_PRINT_STATISTICS )
        ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
    if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
        ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}

在这个方法中我们看到首先for循环运行了所有镜像文件的初始化:sImageRoots[i]->runInitializers;然后就运行主城的初始化和主程序携带的一切东西:sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);。并且发现调用的都是同一个方法。我们跟踪这个runInitializers方法:

ImageLoader::runInitializers

void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time();
    mach_port_t thisThread = mach_thread_self();
    ImageLoader::UninitedUpwards up;
    up.count = 1;
    up.imagesAndPaths[0] = { this, this->getPath() };

    /* ************    处理、准备所有的  Initializers   *************/
    processInitializers(context, thisThread, timingInfo, up);
    
    /* ***********   发送通知       **************/
    context.notifyBatch(dyld_image_state_initialized, false);

    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time();
    fgTotalInitTime += (t2 - t1);
}

在这个方法我们看到主要是processInitializers 方法的处理、准备所有的Initializers;和context.notifyBatch(dyld_image_state_initialized, false);发送通知。

我们看下processInitializers:

// <rdar://problem/14412057> upward dylib initializers can be run too soon
// To handle dangling dylibs which are upward linked but not downward, all upward linked dylibs
// have their initialization postponed until after the recursion through downward dylibs
// has completed.
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                     InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    uint32_t maxImageCount = context.imageCount()+2;
    ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
    ImageLoader::UninitedUpwards& ups = upsBuffer[0];
    ups.count = 0;
    
    /* ***************     对图像列表中的所有图像调用递归init,构建一个未初始化的向上依赖项的新列表。        ******************/
    // Calling recursive init on all images in images list, building a new list of
    // uninitialized upward dependencies.
    for (uintptr_t i=0; i < images.count; ++i) {
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}

在上面这个方法我们看到主要是利用一个for循环来递归init所有镜像列表中的镜像。我们继续跟踪下recursiveInitialization方法:

recursiveInitialization

void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
                                          InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
    recursive_lock lock_info(this_thread);
    recursiveSpinLock(lock_info);

    if ( fState < dyld_image_state_dependents_initialized-1 ) {
        uint8_t oldState = fState;
        // break cycles
        fState = dyld_image_state_dependents_initialized-1;
        try {
            // initialize lower level libraries first
            for(unsigned int i=0; i < libraryCount(); ++i) {
                ImageLoader* dependentImage = libImage(i);
                if ( dependentImage != NULL ) {
                    // don't try to initialize stuff "above" me yet
                    if ( libIsUpward(i) ) {
                        uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
                        uninitUps.count++;
                    }
                    else if ( dependentImage->fDepth >= fDepth ) {
                        dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
                    }
                }
            }
            
            // record termination order
            if ( this->needsTermination() )
                context.terminationRecorder(this);

            /* **********  告诉objc知道我们要初始化这个镜像 注入通知 context.notifySingle   ***************/
            // let objc know we are about to initialize this image
            uint64_t t1 = mach_absolute_time();
            fState = dyld_image_state_dependents_initialized;
            oldState = fState;
            context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
            
            /* **********  doInitialization  initialize this image 这里进入调用objc_init()   ***************/
            // initialize this image
            bool hasInitializers = this->doInitialization(context);

            // let anyone know we finished initializing this image
            fState = dyld_image_state_initialized;
            oldState = fState;
            context.notifySingle(dyld_image_state_initialized, this, NULL);
            
            if ( hasInitializers ) {
                uint64_t t2 = mach_absolute_time();
                timingInfo.addTime(this->getShortName(), t2-t1);
            }
        }
        catch (const char* msg) {
            // this image is not initialized
            fState = oldState;
            recursiveSpinUnLock();
            throw;
        }
    }
    
    recursiveSpinUnLock();
}

ImageLoader::recursiveInitialization这个方法主要点在告诉objc知道我们要初始化这个镜像 注入通知context.notifySingle,并且调用了doInitialization方法来 initialize this image

我们先看看context.notifySingle这个通知到底做了什么:

context.notifySingle():

static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
    //dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath());
    std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sSingleHandlers);
    if ( handlers != NULL ) {
        dyld_image_info info;
        info.imageLoadAddress   = image->machHeader();
        info.imageFilePath      = image->getRealPath();
        info.imageFileModDate   = image->lastModified();
        for (std::vector<dyld_image_state_change_handler>::iterator it = handlers->begin(); it != handlers->end(); ++it) {
            const char* result = (*it)(state, 1, &info);
            if ( (result != NULL) && (state == dyld_image_state_mapped) ) {
                //fprintf(stderr, "  image rejected by handler=%p\n", *it);
                // make copy of thrown string so that later catch clauses can free it
                const char* str = strdup(result);
                throw str;
            }
        }
    }
    if ( state == dyld_image_state_mapped ) {
        // <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache
        // <rdar://problem/50432671> Include UUIDs for shared cache dylibs in all image info when using private mapped shared caches
        if (!image->inSharedCache()
            || (gLinkContext.sharedRegionMode == ImageLoader::kUsePrivateSharedRegion)) {
            dyld_uuid_info info;
            if ( image->getUUID(info.imageUUID) ) {
                info.imageLoadAddress = image->machHeader();
                addNonSharedCacheImageUUID(info);
            }
        }
    }
    
    /* ******接收到dyld_image_state_dependents_initialized 通知 *********/
    if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
        uint64_t t0 = mach_absolute_time();
        dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
        /* ***********   sNotifyObjCInit 对镜像文件的通知处理     *****************/
        
        /* ***********      static _dyld_objc_notify_init       sNotifyObjCInit;     *****************/
        (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
        
        uint64_t t1 = mach_absolute_time();
        uint64_t t2 = mach_absolute_time();
        uint64_t timeInObjC = t1-t0;
        uint64_t emptyTime = (t2-t1)*100;
        if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
            timingInfo->addTime(image->getShortName(), timeInObjC);
        }
    }
    
    // mach message csdlc about dynamically unloaded images
    if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) {
        notifyKernel(*image, false);
        const struct mach_header* loadAddress[] = { image->machHeader() };
        const char* loadPath[] = { image->getPath() };
        notifyMonitoringDyld(true, 1, loadAddress, loadPath);
    }
}

找到这个notifySingle()方法的实现。并且发现对应我们上面方法通知的类型dyld_image_state_dependents_initialized下方的通知是利用(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());这样一个通知来处理的。所以我们继续查找下sNotifyObjCInit。全局查找发现了以下代码:

static _dyld_objc_notify_init       sNotifyObjCInit;

// _dyld_objc_notify_init
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    // record functions to call
    sNotifyObjCMapped   = mapped;
    sNotifyObjCInit     = init;
    sNotifyObjCUnmapped = unmapped;

    // call 'mapped' function with all images mapped so far
    try {
        notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
    }
    catch (const char* msg) {
        // ignore request to abort during registration
    }

    // <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
    for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
        ImageLoader* image = *it;
        if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
            dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
            (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
        }
    }
}

也就是说sNotifyObjCInit只是_dyld_objc_notify_init定义出来的。并且在registerObjCNotifiers()方法赋值:sNotifyObjCInit = init;。既然如此我们继续反推溯源,追踪registerObjCNotifiers()方法。

_dyld_objc_notify_register()

// _dyld_objc_notify_register
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}

经过全局搜索找到了上面的方法调用了registerObjCNotifiers()。可是当我们再次去搜索_dyld_objc_notify_register()方法的时候并没有可以继续提供我们追踪的方法了。至此已经没有路可走了。那我们就转头先看看doInitialization方法:

我们继续跟踪这个doInitialization方法:

ImageLoaderMachO::doInitialization:

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
    CRSetCrashLogMessage2(this->getPath());

    // mach-o has -init and static initializers
    doImageInit(context);
    doModInitFunctions(context);
    
    CRSetCrashLogMessage2(NULL);
    
    return (fHasDashInit || fHasInitializers);
}

这个方法我们也可以看到主要是对mach-o的一个读取初始化doImageInit(context);以及一些方法的初始化调用doModInitFunctions(context);。我们就来看看这个doModInitFunctions

ImageLoaderMachO::doModInitFunctions:

void ImageLoaderMachO::doModInitFunctions(const LinkContext& context)
{
    if ( fHasInitializers ) {
        const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds;
        const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
        const struct load_command* cmd = cmds;
        for (uint32_t i = 0; i < cmd_count; ++i) {
            if ( cmd->cmd == LC_SEGMENT_COMMAND ) {
                const struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
                const struct macho_section* const sectionsStart = (struct macho_section*)((char*)seg + sizeof(struct macho_segment_command));
                const struct macho_section* const sectionsEnd = &sectionsStart[seg->nsects];
                for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) {
                    const uint8_t type = sect->flags & SECTION_TYPE;
                    if ( type == S_MOD_INIT_FUNC_POINTERS ) {
                        Initializer* inits = (Initializer*)(sect->addr + fSlide);
                        const size_t count = sect->size / sizeof(uintptr_t);
                        // <rdar://problem/23929217> Ensure __mod_init_func section is within segment
                        if ( (sect->addr < seg->vmaddr) || (sect->addr+sect->size > seg->vmaddr+seg->vmsize) || (sect->addr+sect->size < sect->addr) )
                            dyld::throwf("__mod_init_funcs section has malformed address range for %s\n", this->getPath());
                        for (size_t j=0; j < count; ++j) {
                            Initializer func = inits[j];
                            // <rdar://problem/8543820&9228031> verify initializers are in image
                            if ( ! this->containsAddress(stripPointer((void*)func)) ) {
                                dyld::throwf("initializer function %p not in mapped image for %s\n", func, this->getPath());
                            }
                            if ( ! dyld::gProcessInfo->libSystemInitialized ) {
                                // <rdar://problem/17973316> libSystem initializer must run first
                                const char* installPath = getInstallPath();
                                if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) != 0) )
                                    dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath());
                            }
                            if ( context.verboseInit )
                                dyld::log("dyld: calling initializer function %p in %s\n", func, this->getPath());
                            bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers != NULL);
                            {
                                dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0);
                                func(context.argc, context.argv, context.envp, context.apple, &context.programVars);
                            }
                            bool haveLibSystemHelpersAfter = (dyld::gLibSystemHelpers != NULL);
                            if ( !haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) {
                                // now safe to use malloc() and other calls in libSystem.dylib
                                dyld::gProcessInfo->libSystemInitialized = true;
                            }
                        }
                    }
                    else if ( type == S_INIT_FUNC_OFFSETS ) {
                        const uint32_t* inits = (uint32_t*)(sect->addr + fSlide);
                        const size_t count = sect->size / sizeof(uint32_t);
                        // Ensure section is within segment
                        if ( (sect->addr < seg->vmaddr) || (sect->addr+sect->size > seg->vmaddr+seg->vmsize) || (sect->addr+sect->size < sect->addr) )
                            dyld::throwf("__init_offsets section has malformed address range for %s\n", this->getPath());
                        if ( seg->initprot & VM_PROT_WRITE )
                            dyld::throwf("__init_offsets section is not in read-only segment %s\n", this->getPath());
                        for (size_t j=0; j < count; ++j) {
                            uint32_t funcOffset = inits[j];
                            // verify initializers are in image
                            if ( ! this->containsAddress((uint8_t*)this->machHeader() + funcOffset) ) {
                                dyld::throwf("initializer function offset 0x%08X not in mapped image for %s\n", funcOffset, this->getPath());
                            }
                            if ( ! dyld::gProcessInfo->libSystemInitialized ) {
                                // <rdar://problem/17973316> libSystem initializer must run first
                                const char* installPath = getInstallPath();
                                if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) != 0) )
                                    dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath());
                            }
                            Initializer func = (Initializer)((uint8_t*)this->machHeader() + funcOffset);
                            if ( context.verboseInit )
                                dyld::log("dyld: calling initializer function %p in %s\n", func, this->getPath());
#if __has_feature(ptrauth_calls)
                            func = (Initializer)__builtin_ptrauth_sign_unauthenticated((void*)func, ptrauth_key_asia, 0);
#endif
                            bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers != NULL);
                            {
                                dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0);
                                func(context.argc, context.argv, context.envp, context.apple, &context.programVars);
                            }
                            bool haveLibSystemHelpersAfter = (dyld::gLibSystemHelpers != NULL);
                            if ( !haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) {
                                // now safe to use malloc() and other calls in libSystem.dylib
                                dyld::gProcessInfo->libSystemInitialized = true;
                            }
                        }
                    }
                }
            }
            cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
        }
    }
}

到这个方法我们就有点蒙了。到这里再也没有很清晰的步骤和线路能让我们直接追踪下去了。可是到这里我们也只是找到了我们前言部分利用bt打印的dyld的一些步骤。至于系统在dyld运行之后怎么进入objc的步骤已然没法从这里探索得知了。这个时候我们不如换个方向。我们直接去objc源码里运行然后打断点到objc_init方法。看看在进入这个方法之前堆栈的一些运行这样我们说不定可以知道在进入objc之前都做了什么。如下图:

9.png

从上图我们可知,果然这种方法是可行的。我们发现在进入ocjc_init之前还经历了两个库,一个是libSystem,一个是libdispatch。并且刚好libSystem库的libSystem_initializer方法紧跟着上面的ImageLoaderMachO::doModInitFunctions方法。所以我们的流程貌似又接回来了。我们直接去下载一个libSystem库。和一个之后要用的libdispatch库。

10.png

打开libdispatch库我们搜索堆栈信息中的_os_object_init方法:
_os_object_init(void)

void
_os_object_init(void)
{
    _objc_init();
    Block_callbacks_RR callbacks = {
        sizeof(Block_callbacks_RR),
        (void (*)(const void *))&objc_retain,
        (void (*)(const void *))&objc_release,
        (void (*)(const void *))&_os_objc_destructInstance
    };
    _Block_use_RR2(&callbacks);
#if DISPATCH_COCOA_COMPAT
    const char *v = getenv("OBJC_DEBUG_MISSING_POOLS");
    if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
    v = getenv("DISPATCH_DEBUG_MISSING_POOLS");
    if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
    v = getenv("LIBDISPATCH_DEBUG_MISSING_POOLS");
    if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
#endif
}

在这个方法我们看到确实调用了objc_init方法初始化进入了objc源码。我们可以验证下objc_init确实是调用objc源码的。全局搜索objc_init发现以下代码:

#if __has_include(<objc/objc-internal.h>)
#include <objc/objc-internal.h>
#else
extern id _Nullable objc_retain(id _Nullable obj) __asm__("_objc_retain");
extern void objc_release(id _Nullable obj) __asm__("_objc_release");
extern void _objc_init(void);
extern void _objc_atfork_prepare(void);
extern void _objc_atfork_parent(void);
extern void _objc_atfork_child(void);
#endif // __has_include(<objc/objc-internal.h>)
#include <objc/objc-exception.h>
#include <Foundation/NSString.h>

从上面的代码可以证明在_os_object_init里调用的_objc_init()确实是从objc引入过来的。这样我们就把objc源码libDispatch联系起来了。接下来我们继续查找_os_object_init是被谁调用的。

DISPATCH_EXPORT DISPATCH_NOTHROW
void
libdispatch_init(void)
{
    /*
       * 省略前面的代码
       */
    _dispatch_hw_config_init();
    _dispatch_time_init();
    _dispatch_vtable_init();
    _os_object_init();
    _voucher_init();
    _dispatch_introspection_init();
}

通过全局搜索我们发现是libdispatch_init方法里调用了_os_object_init();这也刚好验证了我们上面在objc源码bt打印的堆栈步骤信息。

在上面的堆栈信息打印中我们发现调用libdispatch_init方法之前的堆栈已然不在libdispatch这个库了,而是libSystem库。所以我们再次转到libSystem库查找libdispatch_init方法。

进入libSystem库全局搜索libdispatch_init发现以下代码:

extern void libdispatch_init(void);     // from libdispatch.dylib

以及调用

// libsyscall_initializer() initializes all of libSystem.dylib
// <rdar://problem/4892197>
__attribute__((constructor))
static void
libSystem_initializer(int argc,
              const char* argv[],
              const char* envp[],
              const char* apple[],
              const struct ProgramVars* vars)
{
    static const struct _libkernel_functions libkernel_funcs = {
        .version = 4,
        // V1 functions

       /*
        * 省略前面部分代码
        */
    libdispatch_init();
    _libSystem_ktrace_init_func(LIBDISPATCH);

        /*
         * 省略后面部分代码
         */
    _libSystem_ktrace0(ARIADNE_LIFECYCLE_libsystem_init | DBG_FUNC_END);

    /* <rdar://problem/11588042>
     * C99 standard has the following in section 7.5(3):
     * "The value of errno is zero at program startup, but is never set
     * to zero by any library function."
     */
    errno = 0;
}

我们从上面的代码可以看出:libDispatch库init方法确实是在libSystyem库引入并且调用。

到这里我们已经从把两头的方法和步骤排查了一遍,理通了从libSystem库->libDispatch库->objc库 的流程,并且在dyld库我们也理通了从dyldbootstrap_start->_dyld_objc_notify_register->doModInitFunctions。但是从dyld库libSystem库的步骤我们还是没有理清楚。所以我们接下来回到dyld继续查看下 有没有地方涉及到libSystem的地方尤其是在doModInitFunctions方法之后。

我们回到上面我们断路的ImageLoaderMachO::doModInitFunctions方法,因为看堆栈信息也是这个方法之后就进入了libSystem库了。

ps:因为上面已经贴了完整的方法这里就把重点进入libSystem的代码贴出来。

ImageLoaderMachO::doModInitFunctions

void ImageLoaderMachO::doModInitFunctions(const LinkContext& context)
{
    if ( fHasInitializers ) {
        const uint32_t cmd_count = ((macho_header*)fMachOData)->ncmds;
        const struct load_command* const cmds = (struct load_command*)&fMachOData[sizeof(macho_header)];
        const struct load_command* cmd = cmds;
        for (uint32_t i = 0; i < cmd_count; ++i) {

//省略部分代码
                for (const struct macho_section* sect=sectionsStart; sect < sectionsEnd; ++sect) {
                    const uint8_t type = sect->flags & SECTION_TYPE;
                          if ( type == S_MOD_INIT_FUNC_POINTERS ) {
                        //省略部分代码
                    }
                    else if ( type == S_INIT_FUNC_OFFSETS ) {

                        //省略部分代码

                            if ( ! dyld::gProcessInfo->libSystemInitialized ) {
                                // <rdar://problem/17973316> libSystem initializer must run first
                                const char* installPath = getInstallPath();
                                if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) != 0) )
                                    dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath());
                            }
                            Initializer func = (Initializer)((uint8_t*)this->machHeader() + funcOffset);
                            if ( context.verboseInit )
                                dyld::log("dyld: calling initializer function %p in %s\n", func, this->getPath());
#if __has_feature(ptrauth_calls)
                            func = (Initializer)__builtin_ptrauth_sign_unauthenticated((void*)func, ptrauth_key_asia, 0);
#endif
                            bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers != NULL);
                            {
                                dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0);
                                func(context.argc, context.argv, context.envp, context.apple, &context.programVars);
                            }
                            bool haveLibSystemHelpersAfter = (dyld::gLibSystemHelpers != NULL);
                            if ( !haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) {
                                // now safe to use malloc() and other calls in libSystem.dylib
                                dyld::gProcessInfo->libSystemInitialized = true;
                            }
                        }
                    }
                }
            }
            cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
        }
    }
}

在上面的简化方法中,我们仔细看可以发现有这样一句代码和注释:

if ( ! dyld::gProcessInfo->libSystemInitialized ) { 
// <rdar://problem/17973316> libSystem initializer must run first
/*
*省略代码
*/
}

表示在执行下面的if之前必须先要初始化libSystem。而刚好在下面有这样一句代码:

Initializer func = (Initializer)((uint8_t*)this->machHeader() + funcOffset);

machHeader中初始化了一个方法,并且在下面调用了:

#if __has_feature(ptrauth_calls)
    
  func = (Initializer)__builtin_ptrauth_sign_unauthenticated((void*)func, ptrauth_key_asia, 0);
#endif
    
  bool haveLibSystemHelpersBefore = (dyld::gLibSystemHelpers != NULL);
  {
      dyld3::ScopedTimer(DBG_DYLD_TIMING_STATIC_INITIALIZER, (uint64_t)fMachOData, (uint64_t)func, 0);

     func(context.argc, context.argv, context.envp, context.apple, &context.programVars);
   }
  
bool haveLibSystemHelpersAfter = (dyld::gLibSystemHelpers != NULL);
                
if ( !haveLibSystemHelpersBefore && haveLibSystemHelpersAfter ) {                   
  // now safe to use malloc() and other calls in libSystem.dylib                    
    dyld::gProcessInfo->libSystemInitialized = true;    
}

从上面可以看出 在没有初始化libSystem的时候是没办法执行某些代码的。并且后面也直接从machHeader初始化了一个方法fun。而且调用这个fun之后就可以把dyld::gProcessInfo->libSystemInitialized = true;设置为true了。所以我们可以确定这里就是初始化libSystem的一处地方。这样也就串联起来了我们的整个流程。这个流程跨越了dyld->libSystem->libDispatch->objc四大系统库。

整个应用程序加载的流程图如下:
dyld&libSystem&libDispatch&objc流程图.png

文章至此结束,这篇文章花了我不少时间,希望能给自己理解这个流程更有帮助,如果能给你带来些许启发就更让我欣喜了。

遇事不决,可问春风。站在巨人的肩膀上学习,如有疏忽或者错误的地方还请多多指教。谢谢!

上一篇下一篇

猜你喜欢

热点阅读