DWARF & Symbol

2021-03-22  本文已影响0人  纯情_小火鸡

1. DWARF与dSYM的关系

DWARF (Debug With Arbitrary Record Format) 标准调试信息格式。单独保存下来就是dSYM (Debug Symbol File) 文件 。使用MachOView打开一个dSYM,能看到很多DWARF的section。

对比编译日志可以发现,Generate Debug Symbols开关实际上就是控制clang的-g 以及 -gmodules参数,查看clang文档可以得知,该参数就是用于生产debug信息的:

-gmodules
Generate debug info with external references to clang modules or precompiled headers

-g, --debug, --debug=<arg>
Generate source-level debug information

找到Clang的源码对应部分,可以看出如果配置了-gdwarf-X,则使用对应X的Dwarf版本,否则判断参数fdebug-default-version=是否存在对应版本号,如果无则默认指定为DWARF4(我使用的源码是最新的LLVM12)

  bool WantDebug = false;
  unsigned DwarfVersion = 0;
  Args.ClaimAllArgs(options::OPT_g_Group);
  if (Arg *A = Args.getLastArg(options::OPT_g_Group)) {
    WantDebug = !A->getOption().matches(options::OPT_g0) &&
                !A->getOption().matches(options::OPT_ggdb0);
    if (WantDebug)
      DwarfVersion = DwarfVersionNum(A->getSpelling());
  }
  
    unsigned DefaultDwarfVersion = ParseDebugDefaultVersion(getToolChain(), Args);
  if (DwarfVersion == 0)
    DwarfVersion = DefaultDwarfVersion;

  if (DwarfVersion == 0)
    DwarfVersion = getToolChain().GetDefaultDwarfVersion();

打包上线的时候会把调试符号裁剪掉,但是线上统计到的堆栈仍然要能够知道对应的源代码,这时候就需要把符号写到一个单独的dSYM文件中。

Debug符号表是一个映射表,它把每一个编译好的二进制中的机器指令映射到生成它们的每一行源代码中。这些Debug符号表要么被存储在编译好的二进制中,要么单独存储在Debug Symbol文件中(也就是dSYM文件):一般来说,debug模式构建的App会把Debug符号表存储在编译好的二进制中,而release模式构建的App会把Debug符号表存储在dSYM文件中以节省二进制体积。通过Xcode编译日志可以看到,dSYM文件是由 dsymutil 工具生成的。

2. Clang 生成 DWARF 调试信息

int foo(int a, int b) {
    int c;
    static double d = 5.0;
    c = a + b;
    return c;
}

int main() {
    int r;
    r = foo(2, 3);
    return 0;
}
clang -O0 -gdwarf-4 foo.c -o foo

编译完成后,本地会多了 foo.dSYM 和 foo可执行文件

GIH-D-21687:Release-iphoneos n14637$ clang -O0 -gdwarf-4 foo.c -o foo
GIH-D-21687:Release-iphoneos n14637$
GIH-D-21687:Release-iphoneos n14637$ lldb foo
(lldb) target create "foo"
Current executable set to '/Desktop/bcTest/foo' (x86_64).
(lldb) b foo
Breakpoint 1: where = foo`foo + 10 at foo.c:4:9, address = 0x0000000100003f6a
(lldb) run
Process 15959 launched: '/Desktop/bcTest/foo' (x86_64)
Process 15959 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f6a foo`foo(a=2, b=3) at foo.c:4:9
   1    int foo(int a, int b) {
   2        int c;
   3        static double d = 5.0;
-> 4        c = a + b;
   5        return c;
   6    }
   7
Target 0: (foo) stopped.
(lldb)
GIH-D-21687:DWARF n14637$ size -x -m -l foo
Segment __PAGEZERO: 0x100000000 (vmaddr 0x0 fileoff 0)
Segment __TEXT: 0x4000 (vmaddr 0x100000000 fileoff 0)
    Section __text: 0x4b (addr 0x100003f60 offset 0)
    Section __unwind_info: 0x48 (addr 0x100003fac offset 0)
    total 0x93
Segment __DATA: 0x4000 (vmaddr 0x100004000 fileoff 0)
    Section __data: 0x8 (addr 0x100004000 offset 0)
    total 0x8
Segment __LINKEDIT: 0x1000 (vmaddr 0x100008000 fileoff 4096)
Segment __DWARF: 0x1000 (vmaddr 0x100009000 fileoff 8192)
    Section __debug_line: 0x69 (addr 0x100009000 offset 8192)
    Section __debug_pubnames: 0x29 (addr 0x100009069 offset 8297)
    Section __debug_pubtypes: 0x25 (addr 0x100009092 offset 8338)
    Section __debug_aranges: 0x40 (addr 0x1000090b7 offset 8375)
    Section __debug_info: 0xc2 (addr 0x1000090f7 offset 8439)
    Section __debug_abbrev: 0x7e (addr 0x1000091b9 offset 8633)
    Section __debug_str: 0xd0 (addr 0x100009237 offset 8759)
    Section __apple_names: 0x74 (addr 0x100009307 offset 8967)
    Section __apple_namespac: 0x24 (addr 0x10000937b offset 9083)
    Section __apple_types: 0x72 (addr 0x10000939f offset 9119)
    Section __apple_objc: 0x24 (addr 0x100009411 offset 9233)
    total 0x435
total 0x10000a000

可以看到 __DWARF Segment下包含 __debug_line, __debug_pubnames, __debug_pubtypes 等多个Section。
这些 Section 便是 DWARF 在 .dSYM 中的存储方式

GIH-D-21687:DWARF n14637$ dwarfdump foo.dSYM/Contents/Resources/DWARF/foo --debug-info
foo:    file format Mach-O 64-bit x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x000000be version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x000000c2)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("Apple clang version 12.0.0 (clang-1200.0.32.28)")
              DW_AT_language    (DW_LANG_C99)
              DW_AT_name    ("foo.c")
              DW_AT_LLVM_sysroot    ("/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk")
              DW_AT_APPLE_sdk   ("MacOSX10.15.sdk")
              DW_AT_stmt_list   (0x00000000)
              DW_AT_comp_dir    ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos")
              DW_AT_low_pc  (0x0000000100003f60)
              DW_AT_high_pc (0x0000000100003fab)

0x00000032:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000100003f60)
                DW_AT_high_pc   (0x0000000100003f78)
                DW_AT_frame_base    (DW_OP_reg6 RBP)
                DW_AT_name  ("foo")
                DW_AT_decl_file ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                DW_AT_decl_line (1)
                DW_AT_prototyped    (true)
                DW_AT_type  (0x000000ba "int")
                DW_AT_external  (true)

0x0000004b:     DW_TAG_variable
                  DW_AT_name    ("d")
                  DW_AT_type    (0x0000008b "double")
                  DW_AT_decl_file   ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                  DW_AT_decl_line   (3)
                  DW_AT_location    (DW_OP_addr 0x100004000)

0x00000060:     DW_TAG_formal_parameter
                  DW_AT_location    (DW_OP_fbreg -4)
                  DW_AT_name    ("a")
                  DW_AT_decl_file   ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                  DW_AT_decl_line   (1)
                  DW_AT_type    (0x000000ba "int")

0x0000006e:     DW_TAG_formal_parameter
                  DW_AT_location    (DW_OP_fbreg -8)
                  DW_AT_name    ("b")
                  DW_AT_decl_file   ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                  DW_AT_decl_line   (1)
                  DW_AT_type    (0x000000ba "int")

0x0000007c:     DW_TAG_variable
                  DW_AT_location    (DW_OP_fbreg -12)
                  DW_AT_name    ("c")
                  DW_AT_decl_file   ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                  DW_AT_decl_line   (2)
                  DW_AT_type    (0x000000ba "int")

0x0000008a:     NULL

0x0000008b:   DW_TAG_base_type
                DW_AT_name  ("double")
                DW_AT_encoding  (DW_ATE_float)
                DW_AT_byte_size (0x08)

0x00000092:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000100003f80)
                DW_AT_high_pc   (0x0000000100003fab)
                DW_AT_frame_base    (DW_OP_reg6 RBP)
                DW_AT_name  ("main")
                DW_AT_decl_file ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                DW_AT_decl_line (8)
                DW_AT_type  (0x000000ba "int")
                DW_AT_external  (true)

0x000000ab:     DW_TAG_variable
                  DW_AT_location    (DW_OP_fbreg -8)
                  DW_AT_name    ("r")
                  DW_AT_decl_file   ("/Users/n14637/Desktop/bcTest/app/Release-iphoneos/foo.c")
                  DW_AT_decl_line   (9)
                  DW_AT_type    (0x000000ba "int")

0x000000b9:     NULL

0x000000ba:   DW_TAG_base_type
                DW_AT_name  ("int")
                DW_AT_encoding  (DW_ATE_signed)
                DW_AT_byte_size (0x04)

0x000000c1:   NULL

3. info section

info section 是DWARF的核心,其用来描述程序结构,为此 DWARF 提出了 The Debugging Information Entry (DIE) 来以统一的形式描述这些信息,以下是官方文档的部分描述:

DWARF uses a series of debugging information entries (DIEs) to define a low-level representation of a source program. Each debugging information entry consists of an identifying tag and a series of attributes. An entry, or group of entries together, provide a description of a corresponding entity in the source program. The tag specifies the class to which an entry belongs and the attributes define the specific characteristics of the entry.
The debugging information entries are contained in the .debug_info and .debug_types sections of an object file
Each attribute value is characterized by an attribute name. No more than one attribute with a given name may appear in any debugging information entry. There are no limitations on the ordering of attributes within a debugging information entry.
  A variety of needs can be met by permitting a single debugging information entry to “own” an arbitrary number of other debugging entries and by permitting the same debugging information entry to be one of many owned by another debugging information entry. This makes it possible, for example, to describe the static block structure within a source file, to show the members of a structure, union, or class, and to associate declarations with source files or source files with shared objects.
The ownership relation of debugging information entries is achieved naturally because the debugging information is represented as a tree. The nodes of the tree are the debugging information entries themselves. The child entries of any node are exactly those debugging information entries owned by that node.

可以看出,调试信息以树的形式表示,而每个DIE作为树的节点,一个 DIE 可以包含几个子DIE,正如一个文件可以有 N 个函数,一个函数可以包含 X 个形式参数和 Y 个局部变量。而对于DIE本身其包含:

4. Symbol结构

struct nlist_64 存储了symbol的数据结构。而符号的name不在符号表中,而在 String Table 中,所有的字符串都存储在那里。需要根据 n_strx 找到符号的name位于 String Table 中的下标位置,才能找到正确的符号名

struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */ // 符号的name在String Table中的下标。
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};

Symbol Table

符号表存储了符号信息。ld和dyld都会在link的时候读取符号表

Dynamic Symbol Table

动态符号表,Dynamic Symbol Table ,其中仅存储了符号位于Symbol Table中的下标,而非符号数据结构,因为符号的结构仅存储在 Symbol Table 而已,使用 otool 命令可以查看动态符号表中的符号位于符号表中的下标。因此动态符号也叫做 Indirect symbols

Dynamic Symbol Table.png

__la_symbol_ptr

上边的otool命令输出中,有 Indirect symbols for (__DATA,__la_symbol_ptr) 9 entries__la_symbol_ptr 是懒加载的符号指针,即第一次使用到的时候才加载。

首先会在__DATA, __la_symbol_ptr创建一个指针,这个指针编译期会指向__TEXT,__stub_helper,第一次调用的时候,会通过dyld_stub_binder把指针绑定到函数实现,下一次调用的时候就不需要再绑定了。

section_64的结构中有个reserved字段,若该section是 __DATA,__la_symbol_ptr ,则该reserved1字段存储的就是该 __la_symbol_ptr 在Dynamic Symbol Table中的偏移量,也可以理解为下标。

__la_symbol_ptr.png

查找 __la_symbol_ptr 的符号流程如下:

遍历load command,如果发现是__DATA,__la_symbol_ptr,那么读取reserved1,即__la_symbol_ptr的符号位于Dynamic Symbol Table的起始地址。
遍历__DATA,__la_symbol_ptr处的指针,当前遍历的下标为idx,加上reserved1就是该指针对应的Dynamic Symbol Table下标
通过Dynamic Symbol Table,读取Symbol Table的下标
读取Symbol Table,找到String Table的Index
找到符号名称

5. 符号命名规则

C的符号生成规则比较简单,一般的符号都是在函数名上加上下划线。

C++因为支持命名空间,函数重载等高级特性,为了避免符号冲突,所以编译器对C++符号做了Symbol Mangling(不同编译器的规则不一样)。

一般如下规则生成:

Objective C的符号更简单一些,比如方法的符号是+-[Class_name(category_name) method:name:],除了这些,Objective C还会生成一些Runtime元数据的符号

6. 符号的种类

按照不同的方式可以对符号进行不同的分类,比如按照可见性划分

全局符号(Global Symbol) 对其他编译单元可见
本地符号(Local Symbol) 只对当前编译单元可见
按照位置划分:

外部符号,符号不在当前文件,需要ld或者dyld在链接的时候解决
非外部符号,即当前文件内的符号
nm命令里的小写字母对应着本地符号,大写字母表示全局符号;U表示undefined,即未定义的外部符号

7. 实战

我们在拿到一个Crash日志之后着重看到:

Last Exception Backtrace:
0   CoreFoundation                  0x187b9186c __exceptionPreprocess + 220
1   libobjc.A.dylib                 0x19cbacc50 objc_exception_throw + 59
2   CoreFoundation                  0x187c01e1c _CFThrowFormattedException + 115
3   CoreFoundation                  0x187a6f8a8 -[__NSArrayM objectAtIndex:] + 219
4   TestCase                        0x104ee5e40 _hidden#4_ + 24128 (__hidden#44_:48)
5   libdispatch.dylib               0x187785db0 _dispatch_client_callout + 19

Binary Images:
0x104ee0000 - 0x104ee7fff TestCase arm64  <b22862e527c93aa3b12c9f0cdc950ddf> /var/containers/Bundle/Application/D9E40942-5FC9-4811-BCAF-66EFCC53A9B9/TestCase.app/TestCase
  1. 0x104ee0000 - 0x104ee7fff: 是ASLR后的开始和结束地址,通过该地址可以计算出函数在安装包中的地址;

  2. TestCase: 应用的名称

  3. arm64: 应用的架构

  4. b22862e527c93aa3b12c9f0cdc950ddf: uuid的值,这个用来和dysm一一对应;

  5. /var/containers/Bundle/Application/D9E40942-5FC9-4811-BCAF-66EFCC53A9B9/TestCase.app/TestCase:应用的安装路径

对于arm64结构如果没有ASLR的话开始地址是0x100000000,这是由__PAGEZERO段大小决定的。运行内存的开始地址是0x104ee0000,所以偏移了0x4ee0000 = 0x104ee0000 - 0x100000000

所以 0x104ee5e40 在二进制包中的地址为0x100005E40 = 0x104ee5e40 - 0x4ee0000

然后使用Hopper打开dSYM文件,G togo 0x100005E40 ,即可定位到崩溃的方法。

另一种方法可通过 dwarfdump --arch arm64 TestCase.app.dSYM --lookup 0x100005E40

可以看到输出结果:

0x0004847b: DW_TAG_compile_unit
              DW_AT_producer    ("Apple clang version 12.0.0 (clang-1200.0.32.28)")
              DW_AT_language    (DW_LANG_ObjC)
              DW_AT_name    ("/Users/n14637/Desktop/TestCase/TestCase/ViewController.m")
              DW_AT_LLVM_sysroot    ("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS14.3.sdk")
              DW_AT_APPLE_sdk   ("iPhoneOS14.3.sdk")
              DW_AT_stmt_list   (0x0000aecd)
              DW_AT_comp_dir    ("/Users/n14637/Desktop/TestCase")
              DW_AT_APPLE_optimized (true)
              DW_AT_APPLE_major_runtime_vers    (0x02)
              DW_AT_low_pc  (0x0000000100005ccc)
              DW_AT_high_pc (0x000000010000601c)

0x0004869c:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000100005d8c)
                DW_AT_high_pc   (0x0000000100005ec0)
                DW_AT_frame_base    (DW_OP_reg29 W29)
                DW_AT_object_pointer    (0x000486b6)
                DW_AT_call_all_calls    (true)
                DW_AT_name  ("-[ViewController addArray]")
                DW_AT_decl_file ("/Users/n14637/Desktop/TestCase/TestCase/ViewController.m")
                DW_AT_decl_line (45)
                DW_AT_prototyped    (true)
                DW_AT_APPLE_optimized   (true)
Line info: file 'ViewController.m', line 0, column 21, start line 45

或者使用atos不需要计算地址直接查看行数:

// 0x100868000 是 Binary Images 起始地址,0x10086de40为崩溃的栈地址
atos -o TestCase.app.dSYM/Contents/Resources/DWARF/TestCase -arch arm64 -l 0x100868000 0x10086de40
上一篇下一篇

猜你喜欢

热点阅读