iOS Runtime 三: 类中的数据结构和内存布局

2022-08-30  本文已影响0人  Trigger_o

类对象

objc_class

定义在objc-runtime-new.h,继承自objc_object,所以后面都叫它类对象好了.

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

Class getSuperclass() const {...}
void setSuperclass(Class newSuperclass) {...}

class_rw_t *data() const {
        return bits.data();
}
void setData(class_rw_t *newData) {
        bits.setData(newData);
 }

先看这些内容,首先第一个成员是isa,类对象的isa指向元类
第二个成员是Class指针,指向superclass
第三个成员cache是缓存
第四个成员bits存放的是类的具体内容.

然后是getSuperclass() 和setSuperclass(),设置和获取父类.
如果是arm64e环境,并且ISA_SIGNING_SIGN_MODE不是NONE,就会在set的时候签名,在get的时候解签,
否则就直接给superclass赋值,以及直接返回superclass.

最下面两个函数,设置和获取一个class_rw_t *,可以看到调用的是bits的方法
所以class_rw_t和class_data_bits_t是关联的,分别看看这两个结构体,

class_data_bits_t

#if __LP64__
#define FAST_DATA_MASK          0x00007ffffffffff8UL
#else
#define FAST_DATA_MASK        0xfffffffcUL

struct class_data_bits_t {
    friend objc_class;

    uintptr_t bits;
public:
    class_rw_t* data() const {
        return (class_rw_t *)(bits & FAST_DATA_MASK);
    }
    void setData(class_rw_t *newData)
    {
        ASSERT(!data()  ||  (newData->flags & (RW_REALIZING | RW_FUTURE)));
        // Set during realization or construction only. No locking needed.
        // Use a store-release fence because there may be concurrent
        // readers of data and data's contents.
        uintptr_t newBits = (bits & ~FAST_DATA_MASK) | (uintptr_t)newData;
        atomic_thread_fence(memory_order_release);
        bits = newBits;
    }

    // Get the class's ro data, even in the presence of concurrent realization.
    // fixme this isn't really safe without a compiler barrier at least
    // and probably a memory barrier when realizeClass changes the data field
    const class_ro_t *safe_ro() const {
        class_rw_t *maybe_rw = data();
        if (maybe_rw->flags & RW_REALIZED) {
            // maybe_rw is rw
            return maybe_rw->ro();
        } else {
            // maybe_rw is actually ro
            return (class_ro_t *)maybe_rw;
        }
    }

只有一个成员变量bits,类型是uintptr_t,这个类型与当前环境的指针大小相同,而且它确实也可以当做指针,是8个字节.

setData()和getData,实质是设置和获取class_rw_t,注释中说明只在运行时调用这个方法,并且需要注意线程安全.
set的时候,拿bits和FAST_DATA_MASK的取反与运算,然后或上参数newData,得到的值就是新的bits,
以32位为例,假设bits是xxxx xxxP, ~FAST_DATA_MASK是0000 0003, newData是xxxx xxxQ,所以本质是(P&3)|Q,newData的前面都是原模原样;同样的在get的时候,相当于P&c.
0x00007ffffffffff8是0000 0000 0000 0000 011111111111111111111111111111111111111111111000,根据isa的经验,class_rw_t应该就是存储与bits的第4到47位了.

其次是get class_ro_t,注释说明可以并发获取,并且不能修改内容.
这个函数可以看到两种情况,一种情况返回maybe_rw->ro(),也就是说从class_rw_t中获得class_ro_t,
另一种情况是直接变换指针类型,把class_rw_t当做class_ro_t返回,

class_rw_t

接下来就看看class_rw_t的结构

struct class_rw_t {
    // Be warned that Symbolication knows the layout of this structure.
    uint32_t flags;
    uint16_t witness;
    explicit_atomic<uintptr_t> ro_or_rw_ext;
    Class firstSubclass;
    Class nextSiblingClass;
private:
    using ro_or_rw_ext_t = objc::PointerUnion<const class_ro_t, class_rw_ext_t, PTRAUTH_STR("class_ro_t"), PTRAUTH_STR("class_rw_ext_t")>;
    const ro_or_rw_ext_t get_ro_or_rwe() 
    void set_ro_or_rwe(const class_ro_t *ro)
    void set_ro_or_rwe(class_rw_ext_t *rwe, const class_ro_t *ro) 
    class_rw_ext_t *extAlloc(const class_ro_t *ro, bool deep = false);
public:
    void setFlags(uint32_t set)
    void clearFlags(uint32_t clear)
    void changeFlags(uint32_t set, uint32_t clear) 
    class_rw_ext_t *ext() 
    class_rw_ext_t *extAllocIfNeeded()
    class_rw_ext_t *deepCopy(const class_ro_t *ro) 
    const class_ro_t *ro() const 
    void set_ro(const class_ro_t *ro)
    const method_array_t methods() 
    const property_array_t properties()
    const protocol_array_t protocols()
};

首先explicit_atomic,它是继承自C++的atomic,用于包装一个值,实现多个线程安全访问,不会引起数据竞争.

using在这里的作用类似typedef,声明了一个ro_or_rw_ext_t,现在ro_or_rw_ext_t就是一个类名.
PointerUnion是一个类,它的目的和c的union类似,可以定义多种类型的成员,但是同时只能表达一个.

template <class T1, class T2, typename Auth1, typename Auth2>
class PointerUnion {
uintptr_t _value;

定义的时候需要4个泛型模板,两个类型,两个成员名称,在这里传的是class_ro_t和class_rw_ext_t;
也就是说,ro_or_rw_ext_t要么是class_rw_ext_t 要么是class_ro_t.
PointerUnion提供了一个is()用于判断是哪一种,调用的时候需要声明类型,比如v.is<class_rw_ext_t *>(),如果此时表达class_rw_ext_t,就返回true.
PointerUnion还提供了一个get()用于获取数据,方法和is()相同,返回的就是指定类型的指针.

const ro_or_rw_ext_t get_ro_or_rwe() const {
        return ro_or_rw_ext_t{ro_or_rw_ext};
    }

这个方法是用ro_or_rw_ext初始化ro_or_rw_ext_t,PointerUnion只有一个_value属性,在这里就是用ro_or_rw_ext赋值.
后面对于数据的创建存取操作都是由PointerUnion,也就是get_ro_or_rwe()来完成.
并且有时候还有匿名的ro_or_rw_ext_t{ro_or_rw_ext},目的也是用PointerUnion来处理,用同样的ro_or_rw_ext初始化的PointerUnion本质是一个对象.

const method_array_t methods() const {
        auto v = get_ro_or_rwe();
        if (v.is<class_rw_ext_t *>()) {
            return v.get<class_rw_ext_t *>(&ro_or_rw_ext)->methods;
        } else {
            return method_array_t{v.get<const class_ro_t *>(&ro_or_rw_ext)->baseMethods()};
        }
    }

在后面的成员方法中,首先就是获取get_ro_or_rwe(),
比如上面这个获取方法列表的函数methods(),先获取ro_or_rw_ext_t,如果是class_rw_ext_t,就返回它的methods(),如果是class_ro_t,就返回它的baseMethods().

最后算一下class_rw_t的大小,4+2+8+8+8 = 30,但是由于内存对齐,需要32字节.

class_rw_ext_t
PointerUnion用于表达class_rw_ext_t指针或者class_ro_t,那么就来看看这两个结构体

struct class_rw_ext_t {
    DECLARE_AUTHED_PTR_TEMPLATE(class_ro_t)
    class_ro_t_authed_ptr<const class_ro_t> ro;
    method_array_t methods;
    property_array_t properties;
    protocol_array_t protocols;
    char *demangledName;
    uint32_t version;
};

DECLARE_AUTHED_PTR_TEMPLATE是声明一个结构体指针,声明出来的就是class_ro_t_authed_ptr,
这个宏是这么定义的,

#define DECLARE_AUTHED_PTR_TEMPLATE(name)                      \
    template <typename T> using name ## _authed_ptr            \
        = WrappedPtr<T, PTRAUTH_STR(name)>;
#else
#define PTRAUTH_STR(name) PtrauthRaw
#define DECLARE_AUTHED_PTR_TEMPLATE(name)                      \
    template <typename T> using name ## _authed_ptr = RawPtr<T>;
#endif

它的目的是签名,用于安全性,有两种定义,一个是arm64e的签名,一个是不签名,
除了签名还有包装,包装使用下面这个结构体,它有一个指针ptr,这就是最终指向class_ro_t的指针.

template<typename T, typename Auth>
struct WrappedPtr {
private:
    T *ptr;

除此之外,class_rw_t里还有方法列表,属性列表,协议列表.

class_ro_t
里面大概是这些内容

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
#ifdef __LP64__
    uint32_t reserved;
#endif
    union {
        const uint8_t * ivarLayout;
        Class nonMetaclass;
    };
    explicit_atomic<const char *> name;
    void *baseMethodList;
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;
    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;
    method_list_t *baseMethods()
    Class getNonMetaclass() 
    const uint8_t *getIvarLayout()

ro和rw都有Methods,Protocols,Properties,但是他们的类型并不一样.
class_ro_t是类在初始化的时候也初始化,没有提供修改的函数,所以ro也就是read only,此时是没有rw的,ro就通过 using ro_or_rw_ext_t代替rw.
相对应的class_rw_t是可读可写的,当类完成初始化,class_rw_t中class_rw_ext_t的class_ro_t_authed_ptr<const class_ro_t> ro会指向class_ro_t.

cache_t

struct cache_t {
private:
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask;
#if __LP64__
            uint16_t                   _flags;
#endif
            uint16_t                   _occupied;
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache;
    };
//...

explicit_atomic是C++ atomic的封装,封装的类型是uintptr_t,而uintptr_t与当前环境的指针大小相同,也就是8个字节.

接下来是一个共用体

#if __LP64__
typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif

mask_t是个别名,64位环境占4个字节,struct是4+2+2,preopt_cache_t是指针,也是8字节,所以共用体是8字节,
因此cache_t一共是8+8个字节,从类的地址开始,isa_t(8byte) + Class(8byte) + cache_t(16byte) + class_data_bits_t(8byte)

ivar_t
class_ro_t中的Ivars是ivar_list_t类型,它是基础自entsize_list_tt的
ro和rw的很多list都是entsize_list_tt和以entsize_list_tt为基础再次封装的list_array_tt,具体看(下一篇)[https://www.jianshu.com/p/52080de84f38]
这篇可以先不用详细了解,只需要知道entsize_list_tt类似一个数组,有个get()方法获取元素,比如get(0).

struct ivar_t {
    int32_t *offset;
    const char *name;
    const char *type;
    // alignment is sometimes -1; use alignment() instead
    uint32_t alignment_raw;
    uint32_t size;

    uint32_t alignment() const {
        if (alignment_raw == ~(uint32_t)0) return 1U << WORD_SHIFT;
        return 1 << alignment_raw;
    }
};

8+8+8+4+4 = 32字节,具体是如何使用的后面在看.

property_t

struct property_t {
    const char *name;
    const char *attributes;
};

property_t只有两个字符串指针,因为它只对属性进行描述.

成员变量的内存布局

ivar_list_t只在class_ro_t中有,并且rw里没有ivar相关的东西.
但是class_ro_t的初始化中,成员变量并非一个个new出来,而是从mach-o中读取的,在objc4中只能找到class_addIvar这个函数用于动态添加成员,静态加载类的时候不会调用这个函数.
静态加载的过程是一套复杂的流程,对于成员变量,可以先通过runtime来观察.

在这之前,可以先看一下在内存中,ivar_t的样子

@interface MyClass : NSObject
{
    NSInteger _num;
}
@end

@implementation

- (instancetype)init{
    if(self = [super init]){
        _num = 5;
    }
    return self;
}

@end

int main(int argc, const char * argv[]) {
    @autoreleasepool {
          MyClass *my = [[MyClass alloc]init];
          NSLog(@"Hello, World!");
          return 0
      }
}

声明一个类.在NSLog断点.

(lldb) p my.class
(Class) $0 = 0x0000000100008118
(lldb) p (class_data_bits_t *)$0 + 0x20
(class_data_bits_t *) $1 = 0x0000000100008218

从类对象地址开始,偏移8(isa)+8(superclass)+16(cache)就是bits的位置了,换成16进制是0x20,得到class_data_bits_t *

(lldb) p (objc_class *)$0
(objc_class *) $2 = 0x0000000100008118
(lldb) p $2->data()
(class_rw_t *) $3 = 0x0000000109412280
(lldb) p $2->safe_ro()
(const class_ro_t *) $4 = 0x0000000100008098

把Class转换成objc_class *,然后分别获取class_rw_t和class_ro_t.

p $3->ro_or_rw_ext
(explicit_atomic<unsigned long>) $5= {
  std::__1::atomic<unsigned long> = {
    Value = 4295000216
  }
}
(lldb) p/x 4295000216
(long) $6 = 0x0000000100008098
(lldb) p $3->ro()
(const class_ro_t *) $7 = 0x0000000100008098

输出rw里的ro_or_rw_ext,此时它就是ro的地址.或者调用ro()函数也可以

p *$3
(class_rw_t) $8 = {
  flags = 2148007936
  witness = 1
  ro_or_rw_ext = {
    std::__1::atomic<unsigned long> = {
      Value = 4295000216
    }
  }
  firstSubclass = nil
  nextSiblingClass = 0x00007ff85e83b9c8
}
(lldb) p sizeof($3->flags)
(unsigned long) $9 = 4
(lldb) p sizeof($3->witness)
(unsigned long) $10 = 2
(lldb) p sizeof($3->ro_or_rw_ext)
(unsigned long) $11 = 8
(lldb) p sizeof($3->firstSubclass)
(unsigned long) $12 = 8
(lldb) p sizeof($3->nextSiblingClass)
(unsigned long) $13 = 8
(lldb) p sizeof(*$3)
(unsigned long) $14 = 32

把整个rw输出,另外可以看到内存对齐的情况.

p *$4
(const class_ro_t) $15 = {
  flags = 128
  instanceStart = 8
  instanceSize = 16
  reserved = 0
   = {
    ivarLayout = 0x0000000000000000
    nonMetaclass = nil
  }
  name = {
    std::__1::atomic<const char *> = "MyClass" {
      Value = 0x0000000100003fa8 "MyClass"
    }
  }
  baseMethods = {
    ptr = nil
  }
  baseProtocols = nil
  ivars = 0x0000000100008070
  weakIvarLayout = 0x0000000000000000
  baseProperties = nil
  _swiftMetadataInitializer_NEVER_USE = {}
}

然后查看class_ro_t的内容,name是这个ro所属类的名字.
ivars是有值的.

(lldb) p *$4->ivars
(const ivar_list_t) $16 = {
  entsize_list_tt<ivar_t, ivar_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 32, count = 1)
}
(lldb) p $16->get(0)
(ivar_t) $17 = {
  offset = 0x00000001000080e8
  name = 0x0000000100003fb0 "_num"
  type = 0x0000000100003fb5 "q"
  alignment_raw = 3
  size = 8
}

输出ivars并取出第0个元素.
那么_num真正的值存在哪呢.需要根据offset找,offset是成员现对于实例的偏移,而offset是指针,它指向的地址存着真正的偏移量

(lldb) p/x my
(MyClass *) $18 = 0x0000000108e4c3b0
(lldb) x/wx 0x00000001000080e8
0x100008120: 0x00000008

也就是_num存在my后面8个字节,my身就8个字节(isa的大小),所以对象后面紧跟着就是_num.

(lldb) x/4gx $18
0x108e4c3b0: 0x011d800100008129 0x0000000000000005
0x108e4c3c0: 0x0000000108e4c490 0x0000000108e4c6d0

读取my指针地址的内存,读取8x4字节,第一段是isa,第二段存的就是_num的值.
假如成员变量是指针,那这8个字节存的就是这个指针.

属性的内存布局

@interface MyClass : NSObject

@property(nonatomic, strong) NSNumber *number;
@property(nonatomic, assign) NSInteger integer;
@property(atomic, assign) NSInteger atomic;
@property(nonatomic, copy) NSString *Str;
@property(nonatomic, weak) NSObject *weak;
@property(nonatomic, strong, readonly) NSObject *readonly;

@end

@implementation MyClass

- (instancetype)init{
    if(self = [super init]){
        _readonly = NSObject.new;
    }
    return self;
}

@end

定义五个property,分别是不同的修饰.

(lldb) p my.class
(Class) $0 = 0x0000000100008408
(lldb) p (objc_class *)$0
(objc_class *) $1 = 0x0000000100008408
(lldb) p $1->data()
(class_rw_t *) $2 = 0x0000000108e27090
(lldb) p $2->ro()
(const class_ro_t *) $3 = 0x0000000100008328
(lldb) p *$3
(const class_ro_t) $4 = {
  flags = 388
  instanceStart = 8
  instanceSize = 56
  reserved = 0
   = {
    ivarLayout = 0x0000000100003f46 "\U00000001!\U00000011"
    nonMetaclass = 0x0000000100003f46
  }
  name = {
    std::__1::atomic<const char *> = "MyClass" {
      Value = 0x0000000100003f3e "MyClass"
    }
  }
  baseMethods = {
    ptr = 0x00000001000080b8
  }
  baseProtocols = nil
  ivars = 0x00000001000081f8
  weakIvarLayout = 0x0000000100003f4a "A"
  baseProperties = 0x00000001000082c0
  _swiftMetadataInitializer_NEVER_USE = {}
}
(lldb) p $4.baseProperties
(property_list_t *const) $5 = 0x00000001000082c0
(lldb) p *$5
(property_list_t) $6 = {
  entsize_list_tt<property_t, property_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 16, count = 6)
}
(lldb) p $6.get(0)
(property_t) $7 = (name = "number", attributes = "T@\"NSNumber\",&,N,V_number")
(lldb) p $6.get(1)
(property_t) $8 = (name = "integer", attributes = "Tq,N,V_integer")
(lldb) p $6.get(2)
(property_t) $9 = (name = "atomic", attributes = "Tq,V_atomic")
(lldb) p $6.get(3)
(property_t) $10 = (name = "Str", attributes = "T@\"NSString\",C,N,V_Str")
(lldb) p $6.get(4)
(property_t) $11 = (name = "weak", attributes = "T@\"NSObject\",W,N,V_weak")
(lldb) p $6.get(5)
(property_t) $12 = (name = "readonly", attributes = "T@\"NSObject\",R,N,V_readonly")

可以看到property_t存的name和attributes,类似"T@"NSObject",R,N,V_readonly",规则是:
以T开头,后跟@encode类型和逗号,比如NSInteger是q,NSNumber是@"NSNumber.
然后是修饰,以逗号隔开,
最后以V加上下划线加上属性名称结尾,其实下划线加上属性名称就是成员变量,后面细说.

官方文档

其中attributes的修饰大概有这些:


image.png

然后文档还举了一些例子:
比如Tc,Td,Ti,Tf是char, double,enum/int, float
还有一些需要注意的,比如@property(getter=intGetFoo, setter=intSetFoo:) int intSetterGetter;编码后是Ti,GintGetFoo,SintSetFoo:,VintSetterGetter
还有C++指针会加一个,比如int*是Ti; void*是T^v;
还有id类型是T@,也就是后面的类名是空的.
等等

image.png

那么property的真实结构和数据存在哪呢

继续上面的lldb

(lldb) p $4.ivars
(const ivar_list_t *const) $13 = 0x00000001000081f8
(lldb) p *$13
(const ivar_list_t) $14 = {
  entsize_list_tt<ivar_t, ivar_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 32, count = 6)
}
(lldb) p $14.get(0)
(ivar_t) $15 = {
  offset = 0x00000001000083d8
  name = 0x0000000100003e90 "_number"
  type = 0x0000000100003f7a "@\"NSNumber\""
  alignment_raw = 3
  size = 8
}
(lldb) p $14.get(1)
(ivar_t) $16 = {
  offset = 0x00000001000083e0
  name = 0x0000000100003e98 "_integer"
  type = 0x0000000100003f86 "q"
  alignment_raw = 3
  size = 8
}

所以还是property同时还生成了ivars.

不过不仅仅是这样,我们知道@property还会生成setter和getter,这些在后面方法和消息以及类的加载再分析.

上一篇下一篇

猜你喜欢

热点阅读