Angr

2018-09-06 本文已影响2160人 Black_Sun

Angr 基本介绍

angr 来源于CGC项目，最初用于自动攻防。
平台无关(platform-agnostic)的二进制分析框架
( Computer Security Lab ) UCSB，Shellphish

Angr可以干什么？

Disassembly and intermediate-representation lifting
Program instrumentation
Symbolic execution
Control-flow analysis
Data-dependency analysis
Value-set analysis (VSA)

Angr安装

# dependency
sudo apt-get install python-dev libffi-dev build-essential virtualenvwrapper
# install
# we'd better use it in virtual environment
mkvirtualenv angr && pip install angr
# more see https://docs.angr.io/INSTALL.html

ubuntu 16.04 安装

virtualenvwrapper是一个Python虚拟环境，使用虚拟环境的主要原因是angr会修改libz3和libVEX，可能会影响其他程序的正常使用。
新建一个Python虚拟机环境：

$ export WORKON_HOME=~/Envs
$ mkdir -p $WORKON_HOME
$ source /usr/share/virtualenvwrapper/virtualenvwrapper.sh
$ mkvirtualenv angr

基本操作

Project

用来加载 binary，是使用 angr 的基础。

>>> import angr
>>> proj = angr.Project('/bin/true')

基本属性

查看对应binary的基本属性

arch
entry
filename, absolute filename of the binary
loader，
- min_addr
- max_addr
- main_object，Project 加载的二进制文件，即主二进制文件。
  - pic，位置独立
  - execstack，栈是否可以执行
- shared_objects，共享目标文件信息，名字以及映射地址

>>> proj.arch
<Arch AMD64 (LE)>
>>> proj.entry
0x401670
>>> proj.filename
'/bin/true'
>>> proj.loader
<Loaded true, maps [0x400000:0x5004000]>

>>> proj.loader.shared_objects # may look a little different for you!
{'ld-linux-x86-64.so.2': <ELF Object ld-2.24.so, maps [0x2000000:0x2227167]>,
 'libc.so.6': <ELF Object libc-2.24.so, maps [0x1000000:0x13c699f]>}

>>> proj.loader.min_addr
0x400000
>>> proj.loader.max_addr
0x5004000

>>> proj.loader.main_object  # we've loaded several binaries into this project. Here's the main one!
<ELF Object true, maps [0x400000:0x60721f]>

>>> proj.loader.main_object.execstack  # sample query: does this binary have an executable stack?
False
>>> proj.loader.main_object.pic  # sample query: is this binary position-independent?
True

加载选项

基本选项

auto_load_libs 是否自动加载程序的依赖。
except_missing_libs，当加载一个程序的依赖不成功时，就会产生异常。
force_load_libs，强制加载的库。
skip_libs，防止加载的库。
custom_ld_path，优先的共享库的搜寻路径。

高级选项

main_ops 是选项到选项值的映射。
lib_opts 是库名到一个字典的映射，这个字典将名字映射到其对应的值上。

angr.Project(main_opts={'backend': 'ida', 'custom_arch': 'i386'}, lib_opts={'libc.so.6': {'backend': 'elf'}})

Loader

loader （CLE Load Everything，CLE）用于将一个 binary 加载到对应的虚拟地址空间。每类 binary 都有对应的加载器后端（cle.Backend）。比如 cle.ELF 用来加载ELF文件。此外，angr 加载的 binary 都有自己的内存空间，但是并不是内存空间中每一个对象都会有对应的binary。

主加载对象信息

我们可以得出 loader 加载的主对象的基本信息

# This is the "main" object, the one that you directly specified when loading the project
>>> proj.loader.main_object
<ELF Object true, maps [0x400000:0x60105f]>
>>> obj = proj.loader.main_object

# The entry point of the object
>>> obj.entry
0x400580

>>> obj.min_addr, obj.max_addr
(0x400000, 0x60105f)

# Retrieve this ELF's segments and sections
>>> obj.segments
<Regions: [<ELFSegment offset=0x0, flags=0x5, filesize=0xa74, vaddr=0x400000, memsize=0xa74>,
           <ELFSegment offset=0xe28, flags=0x6, filesize=0x228, vaddr=0x600e28, memsize=0x238>]>
>>> obj.sections
<Regions: [<Unnamed | offset 0x0, vaddr 0x0, size 0x0>,
           <.interp | offset 0x238, vaddr 0x400238, size 0x1c>,
           <.note.ABI-tag | offset 0x254, vaddr 0x400254, size 0x20>,
            ...etc

# You can get an individual segment or section by an address it contains:
>>> obj.find_segment_containing(obj.entry)
<ELFSegment offset=0x0, flags=0x5, filesize=0xa74, vaddr=0x400000, memsize=0xa74>
>>> obj.find_section_containing(obj.entry)
<.text | offset 0x580, vaddr 0x400580, size 0x338>

# Get the address of the PLT stub for a symbol
>>> addr = obj.plt['__libc_start_main']
>>> addr
0x400540
>>> obj.reverse_plt[addr]
'__libc_start_main'

# Show the prelinked base of the object and the location it was actually mapped into memory by CLE
>>> obj.linked_base
0x400000
>>> obj.mapped_base
0x400000

其它加载对象信息

# All loaded objects
>>> proj.loader.all_objects
[<ELF Object fauxware, maps [0x400000:0x60105f]>,
 <ELF Object libc.so.6, maps [0x1000000:0x13c42bf]>,
 <ELF Object ld-linux-x86-64.so.2, maps [0x2000000:0x22241c7]>,
 <ELFTLSObject Object cle##tls, maps [0x3000000:0x300d010]>,
 <KernelObject Object cle##kernel, maps [0x4000000:0x4008000]>,
 <ExternObject Object cle##externs, maps [0x5000000:0x5008000]>


# This is a dictionary mapping from shared object name to object
>>> proj.loader.shared_objects
{ 'libc.so.6': <ELF Object libc.so.6, maps [0x1000000:0x13c42bf]>
  'ld-linux-x86-64.so.2': <ELF Object ld-linux-x86-64.so.2, maps [0x2000000:0x22241c7]>}

# Here's all the objects that were loaded from ELF files
# If this were a windows program we'd use all_pe_objects!
>>> proj.loader.all_elf_objects
[<ELF Object true, maps [0x400000:0x60105f]>,
 <ELF Object libc.so.6, maps [0x1000000:0x13c42bf]>,
 <ELF Object ld-linux-x86-64.so.2, maps [0x2000000:0x22241c7]>]

# Here's the "externs object", which we use to provide addresses for unresolved imports and angr internals
>>> proj.loader.extern_object
<ExternObject Object cle##externs, maps [0x5000000:0x5008000]>

# This object is used to provide addresses for emulated syscalls
>>> proj.loader.kernel_object
<KernelObject Object cle##kernel, maps [0x4000000:0x4008000]>

# Finally, you can to get a reference to an object given an address in it
>>> proj.loader.find_object_containing(0x400000)
<ELF Object true, maps [0x400000:0x60105f]>

符号以及重定位信息

我们还可以使用 CLE 来操作二进制文件中的符号。

查找符号，传入符号名或者对应的地址。

>>> malloc = proj.loader.find_symbol('malloc')
>>> malloc
<Symbol "malloc" in libc.so.6 at 0x1054400>

基本符号信息，符号名，所属者，它的地址

>>> malloc.name
'malloc'

>>> malloc.owner_obj
<ELF Object libc.so.6, maps [0x1000000:0x13c42bf]>

# .rebased_addr is its address in the global address space. This is what is shown in the print output.
>>> malloc.rebased_addr
0x1054400
# .linked_addr is its address relative to the prelinked base of the binary. This is the address reported in, for example, readelf(1)
>>> malloc.linked_addr
0x54400
# .relative_addr is its address relative to the object base. This is known in the literature (particularly the Windows literature) as an RVA (relative virtual address).
>>> malloc.relative_addr
0x54400

符号的导入导出信息

>>> malloc.is_export
True
>>> malloc.is_import
False

# On Loader, the method is find_symbol because it performs a search operation to find the symbol.
# On an individual object, the method is get_symbol because there can only be one symbol with a given name.
>>> main_malloc = proj.loader.main_object.get_symbol("malloc")
>>> main_malloc
<Symbol "malloc" in true (import)>
>>> main_malloc.is_export
False
>>> main_malloc.is_import
True
>>> main_malloc.resolvedby
<Symbol "malloc" in libc.so.6 at 0x1054400>

后端

backend name	description	requires `custom_arch`?
elf	Static loader for ELF files based on PyELFTools	no
pe	Static loader for PE files based on PEFile	no
mach-o	Static loader for Mach-O files. Does not support dynamic linking or rebasing.	no
cgc	Static loader for Cyber Grand Challenge binaries	no
backedcgc	Static loader for CGC binaries that allows specifying memory and register backers	no
elfcore	Static loader for ELF core dumps	no
ida	Launches an instance of IDA to parse the file	yes
blob	Loads the file into memory as a flat image	yes

Symbolic Function

默认情况下，angr 会尝试将程序中调用的库函数用自己模拟的函数来代替，这些函数一般对应的对象为SimProcedures 。我们可以从 angr.SIM_PROCEDURES 中找到所有的函数。这些函数的命名规范为package name(libc, posix, win32, etc...)+function name。

需要注意的是

当auto_load_libs 是 True 的时候，真正的库函数会被执行。
。。。

hook

hook 指定的函数，使得angr执行自己给定的函数。

>>> stub_func = angr.SIM_PROCEDURES['stubs']['ReturnUnconstrained'] # this is a CLASS
>>> proj.hook(0x10000, stub_func())  # hook with an instance of the class

>>> proj.is_hooked(0x10000)            # these functions should be pretty self-explanitory
True
>>> proj.unhook(0x10000)
>>> proj.hooked_by(0x10000)
<ReturnUnconstrained>

# length keyword argument to make execution jump some number of bytes forward after your hook finishes.
>>> @proj.hook(0x20000, length=5)
... def my_hook(state):
...     state.regs.rax = 1

>>> proj.is_hooked(0x20000)
True

factory

原因

很多 angr 中的类需要使用到 project 才能实例化，使用 factory可以避免传递 project 对象。
factory 也可以提供一些方便的构造器。

方法

block(addr)
- 提取给定地址的基本块，返回一个块对象。
- 需要注意的是Angr分析程序的单元是基本。
bitvector
- 寄存器使用位向量来描述

基本块

属性

instructions
- 对应基本块的指令的个数。
instructions_addrs
- 基本块每个
capstone
- capstone block对象
vex

方法

pp()
- 漂亮地输出对象基本块的汇编代码。

state

project 只是给出程序最初镜像的信息，state 可以给出模拟程序执行到某条指令时的进程的具体状态。在 angr 中，则使用 SimState 来描述。

state 中所有的信息均使用位向量.
可以直接向寄存器和内存中存储整数，angr 会将其转换为位向量。

预置执行状态

我们可以根据 factory 来设置程序执行到指定地址的默认状态。

.blank_state() constructs a "blank slate" blank state, with most of its data left uninitialized. When accessing uninitialized data, an unconstrained symbolic value will be returned.
.entry_state() constructs a state ready to execute at the main binary's entry point.
.full_init_state() constructs a state that is ready to execute through any initializers that need to be run before the main binary's entry point, for example, shared library constructors or preinitializers. When it is finished with these it will jump to the entry point.
.call_state() constructs a state ready to execute a given function.
- you should call it with .call_state(addr, arg1, arg2, ...), where addr is the address of the function you want to call and argN is the Nth argument to that function, either as a python integer, string, or array, or a bitvector.

基本状态信息

寄存器

state.regs.rip

内存

模式:state.mem[addr].type.xxx

要访问的内存地址
type指定相应地址应该被解释成的类型。
xxx
- 空，可直接存储数据。
- 使用.resolved 来把数据输出为位向量。
- 使用.concrete 来把数据输出为int值。

>>> import angr
>>> proj = angr.Project('/bin/true')
>>> state = proj.factory.entry_state()

# copy rsp to rbp
>>> state.regs.rbp = state.regs.rsp

# store rdx to memory at 0x1000
>>> state.mem[0x1000].uint64_t = state.regs.rdx

# dereference rbp
>>> state.regs.rbp = state.mem[state.regs.rbp].uint64_t.resolved

# add rax, qword ptr [rsp + 8]
>>> state.regs.rax += state.mem[state.regs.rsp + 8].uint64_t.resolved

文件系统

执行

基本执行

>>> proj = angr.Project('examples/fauxware/fauxware')
>>> state = proj.factory.entry_state()
>>> while True:
...     succ = state.step()
...     if len(succ.successors) == 2:
...         break
...     state = succ.successors[0]

>>> state1, state2 = succ.successors
>>> state1
<SimState @ 0x400629>
>>> state2
<SimState @ 0x400699>

低层次内存访问

默认大端序存储。

>>> s = proj.factory.blank_state()
>>> s.memory.store(0x4000, s.solver.BVV(0x0123456789abcdef0123456789abcdef, 128))
>>> s.memory.load(0x4004, 6) # load-size is in bytes
<BV48 0x89abcdef0123>
>>> import archinfo
>>> s.memory.load(0x4000, 4, endness=archinfo.Endness.LE)
<BV32 0x67453201>

State Option

# Example: enable lazy solves, an option that causes state satisfiability to be checked as infrequently as possible.
# This change to the settings will be propagated to all successor states created from this state after this line.
>>> s.options.add(angr.options.LAZY_SOLVES)

# Create a new state with lazy solves enabled
>>> s = proj.factory.entry_state(add_options={angr.options.LAZY_SOLVES})

# Create a new state without simplification options enabled
>>> s = proj.factory.entry_state(remove_options=angr.options.simplification)

solver

solver 基本就是一个约束求解引擎。

操作位向量

位向量与 python 中的整形的转换。

将给定数值转换为指定位数的位向量。

# 64-bit bitvectors with concrete values 1 and 100
>>> one = state.solver.BVV(1, 64)
>>> one
 <BV64 0x1>
>>> one_hundred = state.solver.BVV(100, 64)
>>> one_hundred
 <BV64 0x64>

# create a 27-bit bitvector with concrete value 9
>>> weird_nine = state.solver.BVV(9, 27)
>>> weird_nine
<BV27 0x9>

位向量运算，位向量的位数必须一样。

>>> one + one_hundred
<BV64 0x65>

# You can provide normal python integers and they will be coerced to the appropriate type:
>>> one_hundred + 0x100
<BV64 0x164>

# The semantics of normal wrapping arithmetic apply
>>> one_hundred - one*200
<BV64 0xffffffffffffff9c>

# use extend to extent the length of bitvector
# also there is sign_extend
>>> weird_nine.zero_extend(64 - 27)
<BV64 0x9>
>>> one + weird_nine.zero_extend(64 - 27)
<BV64 0xa>

位向量符号

# Create a bitvector symbol named "x" of length 64 bits
>>> x = state.solver.BVS("x", 64)
>>> x
<BV64 x_9_64>
>>> y = state.solver.BVS("y", 64)
>>> y
<BV64 y_10_64>

混合位向量符号的运算

>>> x + one
<BV64 x_9_64 + 0x1>

>>> (x + one) / 2
<BV64 (x_9_64 + 0x1) / 0x2>

>>> x - y
<BV64 x_9_64 - y_10_64>

AST 查看

>>> tree = (x + 1) / (y + 2)
>>> tree
<BV64 (x_9_64 + 0x1) / (y_10_64 + 0x2)>
>>> tree.op
'__div__'
>>> tree.args
(<BV64 x_9_64 + 0x1>, <BV64 y_10_64 + 0x2>)
>>> tree.args[0].op
'__add__'
>>> tree.args[0].args
(<BV64 x_9_64>, <BV64 0x1>)
>>> tree.args[0].args[1].op
'BVV'
>>> tree.args[0].args[1].args
(1, 64)

符号约束

比较默认情况下按照无符号进行比较。

>>> x == 1
<Bool x_9_64 == 0x1>
>>> x == one
<Bool x_9_64 == 0x1>
>>> x > 2
<Bool x_9_64 > 0x2>
>>> x + y == one_hundred + 5
<Bool (x_9_64 + y_10_64) == 0x69>
>>> one_hundred > 5
<Bool True>
>>> one_hundred > -5
<Bool False>

如何判断

>>> yes = one == 1
>>> no = one == 2
>>> maybe = x == y
>>> state.solver.is_true(yes)
True
>>> state.solver.is_false(yes)
False
>>> state.solver.is_true(no)
False
>>> state.solver.is_false(no)
True
>>> state.solver.is_true(maybe)
False
>>> state.solver.is_false(maybe)
False

约束求解

基本步骤

添加约束
求解

>>> state.solver.add(x > y)
>>> state.solver.add(y > 2)
>>> state.solver.add(10 > x)
>>> state.solver.eval(x)
4

# get a fresh state without constraints
>>> state = proj.factory.entry_state()
>>> input = state.solver.BVS('input', 64)
>>> operation = (((input + 4) * 3) >> 1) + input
>>> output = 200
>>> state.solver.add(operation == output)
>>> state.solver.eval(input)
0x3333333333333381
# If we add conflicting or contradictory constraints
>>> state.solver.add(input < 2**32)
>>> state.satisfiable()
False

Simulation Managers

我们用 state 来描述程序执行到某个地址时程序的具体状态。同时，我们使用 Simulation Managers 来管理程序如何由一个状态到另一个状态。它是 angr 中模拟控制程序的重要接口。

创建模拟管理器

>>> simgr = proj.factory.simgr(state) # TODO: change name before merge
<SimulationManager with 1 active>

查看状态信息

对于一个管理器来说，它可以存储多个状态，自然也可以查看每个状态的具体信息。其中 active 状态由我们默认传入的状态初始化得到。

>>> simgr.active
[<SimState @ 0x401670>]
>>> simgr.active[0].regs.rip                 # new and exciting!
<BV64 0x1020300>
>>> state.regs.rip                           # still the same!
<BV64 0x401670>

执行

执行一个基本块，这并不会修改最初的时候传入的状态。

>>> simgr.step()

# Step until the first symbolic branch
>>> while len(simgr.active) == 1:
...    simgr.step()

>>> simgr
<SimulationManager with 2 active>
>>> simgr.active
[<SimState @ 0x400692>, <SimState @ 0x400699>]

# Step until everything terminates
>>> simgr.run()
>>> simgr
<SimulationManager with 3 deadended>

Stash Management

转移stash

>>> simgr.move(from_stash='deadended', to_stash='authenticated', filter_func=lambda s: 'Welcome' in s.posix.dumps(1))
>>> simgr
<SimulationManager with 2 authenticated, 1 deadended>

列举stash

>>> for s in simgr.deadended + simgr.authenticated:
...     print hex(s.addr)
0x1000030
0x1000078
0x1000078
# If you prepend the name of a stash with one_, you will be given the first state in the stash. 
>>> simgr.one_deadended
<SimState @ 0x1000030>
#  If you prepend the name of a stash with mp_, you will be given a mulpyplexed version of the stash.
>>> simgr.mp_authenticated
MP([<SimState @ 0x1000078>, <SimState @ 0x1000078>])
>>> simgr.mp_authenticated.posix.dumps(0)
MP(['\x00\x00\x00\x00\x00\x00\x00\x00\x00SOSNEAKY\x00',
    '\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x80\x80\x80\x80@\x80@\x00'])

explore！！！！

寻找到达指定地址时程序的状态。 一般会有一个find参数

要停止的指令的地址
一组停止的指令地址
一个检查某个状态是否满足要求的函数

对于找到的状态会放在 find 对应的 store 中。

同时，也可以在explore中添加avoid条件，即避免 angr 探索这些对应的地址。

>>> proj = angr.Project('examples/CSCI-4968-MBE/challenges/crackme0x00a/crackme0x00a')
>>> simgr = proj.factory.simgr()
>>> simgr.explore(find=lambda s: "Congrats" in s.posix.dumps(1))
<SimulationManager with 1 active, 1 found>
>>> s = simgr.found[0]
>>> print s.posix.dumps(1)
Enter password: Congrats!

>>> flag = s.posix.dumps(0)
>>> print(flag)
g00dJ0B!

extra

stash 类型

Stash	Description
active	This stash contains the states that will be stepped by default, unless an alternate stash is specified.
deadended	A state goes to the deadended stash when it cannot continue the execution for some reason, including no more valid instructions, unsat state of all of its successors, or an invalid instruction pointer.
pruned	When using `LAZY_SOLVES`, states are not checked for satisfiability unless absolutely necessary. When a state is found to be unsat in the presence of `LAZY_SOLVES`, the state hierarchy is traversed to identify when, in its history, it initially became unsat. All states that are descendants of that point (which will also be unsat, since a state cannot become un-unsat) are pruned and put in this stash.( 使用LAZY_SOLVES时，不检查可满足性，当一个状态在LAZY_SOLVES之前就被抛弃时，当被遍历去识别这个状态的时候，直到找到一个不能被抛弃的节点。修剪到这个节点，并将这个状态存起来。)
unconstrained	If the `save_unconstrained` option is provided to the SimulationManager constructor, states that are determined to be unconstrained (i.e., with the instruction pointer controlled by user data or some other source of symbolic data) are placed here.（这个save_unconstrained选项被SMC激活，状态不在被约束，指令将会用户数据和一些其它的符号数据源控制）
unsat	If the `save_unsat` option is provided to the SimulationManager constructor, states that are determined to be unsatisfiable (i.e., they have constraints that are contradictory, like the input having to be both "AAAA" and "BBBB" at the same time) are placed here. （save_unsat表示状态的满足条件）

analysis

给出程序的各种分析信息。

如控制流图

# Originally, when we loaded this binary it also loaded all its dependencies into the same virtual address  space
# This is undesirable for most analysis.
>>> proj = angr.Project('/bin/true', auto_load_libs=False)
>>> cfg = proj.analyses.CFGFast()
<CFGFast Analysis Result at 0x2d85130>

# cfg.graph is a networkx DiGraph full of CFGNode instances
# You should go look up the networkx APIs to learn how to use this!
>>> cfg.graph
<networkx.classes.digraph.DiGraph at 0x2da43a0>
>>> len(cfg.graph.nodes())
951

# To get the CFGNode for a given address, use cfg.get_any_node
>>> entry_node = cfg.get_any_node(proj.entry)
>>> len(list(cfg.graph.successors(entry_node)))
2

class angr.block.CapstoneBlock(addr, insns, thumb, arch)

Deep copy of the capstone blocks, which have serious issues with having extended lifespans outside of capstone itself
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////

angr-0ctf_momo

1.需要逆向找到约束求解的三个条件

dx, dword ptr [edx*4 + 0x81fe260]
al, byte ptr [0x81fe6e0]
dl, byte ptr [0x81fe6e4]

2.需要掌握“逆向MoVfuscator编译程序”能力

1.使用qira+ida进行人工分析，
2.或使用“movfuscator的反混淆器”
3.使用Makefile+二进制插桩
4.angr求解是建立在对程序逆向的理解程度

3.angr约束求解的过程，有一部分还理解的不是很清楚

参考网站：
1：angr学习(四)：
http://www.cnblogs.com/fancystar/p/7893248.html
2：Makefile+二进制插桩：
https://blog.xy14qg.top/2016/0ctf-2016-writeup/#momo-reverse
3：angr用例解析——0ctf_momo_3：
http://blog.csdn.net/doudoudouzoule/article/details/79537019
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
相关资料：
在网上发现一个开源项目，https://github.com/kirschju/demovfuscator 是专门来应该movfuscator的反混淆器，果断安装
momo使用qira解决movfuscator
http://blog.csdn.net/charlie_heng/article/details/79206863