dyld流程分析
编译流程
在开始分析dyld
之前,我们先看下分析下可执行文件的整个编译流程:
如上图所示,我们编写的源文件,会在预编译阶段在进行此法语法分析,然后经过编译后在经过汇编生成目标文件,在通过链接,将汇编生成的目标文件和引入的静态库链接到程序中,生成可执行文件。
静态库
静态库形式:.a
和.framework
静态库在链接时,会将编译阶段生成的目标文件.o与引用到的库一起链接到程序中。对应的连接方式称为静态链接。
静态库如果多个程序用到了同一个静态库B,则静态库B就会被拷贝多份到内存中。则会造成性能和内存的消费。
静态库的特点:
- 静态库对函数库的链接是在编译期完成的。执行期间代码装载速度快
- 会使目标文件的体积变大,会造成性能和内存的消费
- 全量更新,对程序的更新、部署与发布不便
动态库
动态库形式:.dylib、.tbd
动态库动态库在编译时并不会被链接到目标文件中,而是在程序载入的过程中才会被载入。动态库在内存中存放在共享缓存中,只会保存一份。会以共享库的实例将其载入。
静态库的特点:
- 在运行时载入,缩减目标文件体积
- 共享库,共享内存,节约资源
- 增量更新,将程序的升级变的简单,不需要重新编译
编译过程中的资源都被散乱的加载到内存中,那么是如何来进行初始化,加载和使用的呢?这就引出了我们下面要说的dyld。
dyld 动态链接器
dyld
是苹果的动态链接器,是苹果操作系统的一个重要组成部分,在应用被编译打包成可执行文件格式的Mach-O
文件之后,交由dyld
负责链接,进行加载程序。
我们下载 dyld 的最新源码。
Demo在ViewController.m
里面实现load
方法,在main.m
里面实现一个c++
方法。最后发现打印后的结果顺序是:先执行ViewController.m
中的方法,在执行c++
的方法,最后执行main
函数。
那么就这个问题,我们在接下来的流程中着重分析下。我们在ViewController.m
中的load
方法中打个断点,分析下整个应用程序的启动流程。
从上图中,我们可以看到,程序的入口函数是_dyld_start
。
从上图可知,当我们全局搜索_dyld_start
的时候,我们以__x86_64__
架构为例。可以分析到后面调用了dyldbootstrap::start
方法。
dyldbootstrap::start
//
// This is code to bootstrap dyld. This work in normally done for a program by dyld and crt.
// In dyld we have to do this manually.
//
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
// Emit kdebug tracepoint to indicate dyld bootstrap has started <rdar://46878536>
dyld3::kdebug_trace_dyld_marker(DBG_DYLD_TIMING_BOOTSTRAP_START, 0, 0, 0, 0);
// if kernel had to slide dyld, we need to fix up load sensitive locations
// we have to do this before using any global variables
rebaseDyld(dyldsMachHeader);
// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple != NULL) { ++apple; }
++apple;
// set up random value for stack canary
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(argc, argv, envp, apple);
#endif
// now that we are done bootstrapping dyld, call dyld's main
uintptr_t appsSlide = appsMachHeader->getSlide();
return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
从dyldbootstrap::start
源码和dyld启动流程图
中我们可以看出,最后走了dyld::_main
函数;
dyld::_main
源码过长,将主要的执行方法流程贴出,大家可以自己下载源码去对应的查看下:
- 1、环境配置:版本信息、平台信息、模拟器、设置上下文等
- 2、设置共享缓存:
mapSharedCache
- 3、主程序初始化:
sMainExecutable
的赋值情况,其实是调用了instantiateFromLoadedImage
- 3.1、在
instantiateFromLoadedImage
中,调用ImageLoaderMachO::instantiateMainExecutable
来获取到处理好的镜像文件- 3.1.1、在
ImageLoaderMachO::instantiateMainExecutable
中调用sniffLoadCommands
,构建主程序中的一些格式(Mach-O)
- 3.1.1、在
- 3.2、加载到当前的
images
里面
- 3.1、在
- 4、插入动态库:
loadInsertedDylib
- 5、链接主程序:
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
- 6、
for循环
链接镜像文件:link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
- 7、main函数:
initializeMainExecutable();
initializeMainExecutable 流程分析
- 1、在
initializeMainExecutable
函数中调用runInitializers
fa为所有插入的dylibs
运行initialzers
void initializeMainExecutable()
{
// record that we've reached this step
gLinkContext.startedInitializingMainExecutable = true;
// run initialzers for any inserted dylibs
ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
initializerTimes[0].count = 0;
const size_t rootCount = sImageRoots.size();
if ( rootCount > 1 ) {
for(size_t i=1; i < rootCount; ++i) {
sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
}
}
// run initializers for main executable and everything it brings up
sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
// register cxa_atexit() handler to run static terminators in all loaded images when this process exits
if ( gLibSystemHelpers != NULL )
(*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);
// dump info if requested
if ( sEnv.DYLD_PRINT_STATISTICS )
ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}
- 2、接下来会在
runInitializers
中调用processInitializers
来加载镜像文件
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
uint64_t t1 = mach_absolute_time();
mach_port_t thisThread = mach_thread_self();
ImageLoader::UninitedUpwards up;
up.count = 1;
up.imagesAndPaths[0] = { this, this->getPath() };
processInitializers(context, thisThread, timingInfo, up);
context.notifyBatch(dyld_image_state_initialized, false);
mach_port_deallocate(mach_task_self(), thisThread);
uint64_t t2 = mach_absolute_time();
fgTotalInitTime += (t2 - t1);
}
- 3、在
processInitializers
中会循环调用recursiveInitialization
来加载镜像文件(镜像文件中可能引用了其它的镜像文件)
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
uint32_t maxImageCount = context.imageCount()+2;
ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
ImageLoader::UninitedUpwards& ups = upsBuffer[0];
ups.count = 0;
// Calling recursive init on all images in images list, building a new list of
// uninitialized upward dependencies.
// 可能镜像文件中引用了镜像文件,也就是库中有库
for (uintptr_t i=0; i < images.count; ++i) {
images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
}
// If any upward dependencies remain, init them.
if ( ups.count > 0 )
processInitializers(context, thisThread, timingInfo, ups);
}
- 4、在
recursiveInitialization
方法中,会先递归调用当前image
的dylib动态库
的初始化函数进行初始化,然后才调用doInitialization
来调用自己的初始化函数,当中间image
的state
状态切换时,对外通过notifySingle
方法给外部环境context
发出状态变化的通知(如果外部有内容监听到了相关通知,那么会执行相应的回调)
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
recursive_lock lock_info(this_thread);
recursiveSpinLock(lock_info);
if ( fState < dyld_image_state_dependents_initialized-1 ) {
uint8_t oldState = fState;
// break cycles
fState = dyld_image_state_dependents_initialized-1;
try {
// initialize lower level libraries first
for(unsigned int i=0; i < libraryCount(); ++i) {
ImageLoader* dependentImage = libImage(i);
if ( dependentImage != NULL ) {
// don't try to initialize stuff "above" me yet
if ( libIsUpward(i) ) {
uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
uninitUps.count++;
}
else if ( dependentImage->fDepth >= fDepth ) {
dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
}
}
}
// record termination order
if ( this->needsTermination() )
context.terminationRecorder(this);
// let objc know we are about to initialize this image
uint64_t t1 = mach_absolute_time();
fState = dyld_image_state_dependents_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
// initialize this image
bool hasInitializers = this->doInitialization(context);
// let anyone know we finished initializing this image
fState = dyld_image_state_initialized;
oldState = fState;
context.notifySingle(dyld_image_state_initialized, this, NULL);
if ( hasInitializers ) {
uint64_t t2 = mach_absolute_time();
timingInfo.addTime(this->getShortName(), t2-t1);
}
}
catch (const char* msg) {
// this image is not initialized
fState = oldState;
recursiveSpinUnLock();
throw;
}
}
recursiveSpinUnLock();
}
notifySingle
我们先看下notifySingle
的实现
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
//dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath());
std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sSingleHandlers);
if ( handlers != NULL ) {
dyld_image_info info;
info.imageLoadAddress = image->machHeader();
info.imageFilePath = image->getRealPath();
info.imageFileModDate = image->lastModified();
for (std::vector<dyld_image_state_change_handler>::iterator it = handlers->begin(); it != handlers->end(); ++it) {
const char* result = (*it)(state, 1, &info);
if ( (result != NULL) && (state == dyld_image_state_mapped) ) {
//fprintf(stderr, " image rejected by handler=%p\n", *it);
// make copy of thrown string so that later catch clauses can free it
const char* str = strdup(result);
throw str;
}
}
}
if ( state == dyld_image_state_mapped ) {
// <rdar://problem/7008875> Save load addr + UUID for images from outside the shared cache
if ( !image->inSharedCache() ) {
dyld_uuid_info info;
if ( image->getUUID(info.imageUUID) ) {
info.imageLoadAddress = image->machHeader();
addNonSharedCacheImageUUID(info);
}
}
}
if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
uint64_t t0 = mach_absolute_time();
dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
uint64_t t1 = mach_absolute_time();
uint64_t t2 = mach_absolute_time();
uint64_t timeInObjC = t1-t0;
uint64_t emptyTime = (t2-t1)*100;
if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
timingInfo->addTime(image->getShortName(), timeInObjC);
}
}
// mach message csdlc about dynamically unloaded images
if ( image->addFuncNotified() && (state == dyld_image_state_terminated) ) {
notifyKernel(*image, false);
const struct mach_header* loadAddress[] = { image->machHeader() };
const char* loadPath[] = { image->getPath() };
notifyMonitoringDyld(true, 1, loadAddress, loadPath);
}
}
通过recursiveInitialization
方法中的监听的状态dyld_image_state_dependents_initialized
可以知道会执行(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
那么sNotifyObjCInit
是怎么来的呢?
sNotifyObjCInit
全局搜索后,可以发现会在registerObjCNotifiers
中对sNotifyObjCInit
有赋值。
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
// call 'mapped' function with all images mapped so far
try {
notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
}
catch (const char* msg) {
// ignore request to abort during registration
}
// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
ImageLoader* image = *it;
if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
}
}
}
在registerObjCNotifiers
其实是在_dyld_objc_notify_register
中调用的。那么_dyld_objc_notify_register
又是在哪里调用的呢?
我们在runtime源码objc-781
中全局搜索下,可以发现_objc_init
中调用该方法。
void _objc_init(void)
{
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
runtime_init();
exception_init();
cache_init();
_imp_implementationWithBlock_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
#if __OBJC2__
didCallDyldNotifyRegister = true;
#endif
}
doInitialization
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
CRSetCrashLogMessage2(this->getPath());
// mach-o has -init and static initializers
doImageInit(context);
doModInitFunctions(context);
CRSetCrashLogMessage2(NULL);
return (fHasDashInit || fHasInitializers);
}
在doInitialization
中会调用doImageInit
和doModInitFunctions
方法,这两个方法就是从镜像文件中获取这个镜像的真正的入口初始化方法initializer
并调用。
因为dyld
是不能调试的,我们可以通过打符号断点_objc_init
来查看流程。
从上图可以看出动态库初始化函数的真正调用是在ImageLoaderMachO::doModInitFunctions
函数中, 对于libSystem.B.dylib
来说其初始化函数是libSystem_initializer
, 在这个函数中libdispatch_init
被调用, libSystem
以及libdispatch
也是开源的, 可以查看相关源码。
libSystem
在libSystem_initializer
中会首先调用dyld
的初始化方法_dyld_initializer
,然后会调用libdispatch.dylib
的初始化方法libdispatch_init
。
__attribute__((constructor))
static void
libSystem_initializer(int argc,
const char* argv[],
const char* envp[],
const char* apple[],
const struct ProgramVars* vars)
{
...
_dyld_initializer();
_libSystem_ktrace_init_func(DYLD);
libdispatch_init();
_libSystem_ktrace_init_func(LIBDISPATCH);
...
}
libdispatch_init
在libdispatch_init
中会调用到_os_object_init
。
void
libdispatch_init(void)
{
...
#endif
_dispatch_hw_config_init();
_dispatch_time_init();
_dispatch_vtable_init();
_os_object_init();
_voucher_init();
_dispatch_introspection_init();
}
_os_object_init
在_os_object_init
中就会直接调用到_objc_init
了。而_objc_init
是来自runtime
中objc-781
源码中的方法。
extern void _objc_init(void);
void
_os_object_init(void)
{
_objc_init();
Block_callbacks_RR callbacks = {
sizeof(Block_callbacks_RR),
(void (*)(const void *))&objc_retain,
(void (*)(const void *))&objc_release,
(void (*)(const void *))&_os_objc_destructInstance
};
_Block_use_RR2(&callbacks);
#if DISPATCH_COCOA_COMPAT
const char *v = getenv("OBJC_DEBUG_MISSING_POOLS");
if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
v = getenv("DISPATCH_DEBUG_MISSING_POOLS");
if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
v = getenv("LIBDISPATCH_DEBUG_MISSING_POOLS");
if (v) _os_object_debug_missing_pools = _dispatch_parse_bool(v);
#endif
}
分析完了,我们附上一张dyld
的整体流程分析图: