AT&T Assembly Syntax [ AT&T 汇编语法
AT&T Assembly Syntax [ AT&T 汇编语法 ]
vivek, Mon, 2003-09-01 23:53
Updated: May/10 '06
harry翻译 星期四 2018-06-28
This article is a 'quick-n-dirty' introduction to the AT&T assembly language syntax, as implemented in the GNU Assembler as(1). For the first timer the AT&T syntax may seem a bit confusing, but if you have any kind of assembly language programming background, it's easy to catch up once you have a few rules in mind. I assume you have some familiarity to what is commonly referred to as the INTEL-syntax for assembly language instructions, as described in the x86 manuals. Due to its simplicity, I use the NASM (Netwide Assembler) variant of the INTEL-syntax to cite differences between the formats.
这篇文章是 '简单粗暴' 的介绍关于AT&T 汇编语言语法,作为GNU Assembler as (1) 的实现,对于初学者,AT&T语法可能看起来有点混乱,但如果你有任意一种汇编语言编程背景,记住一些规则就会比较容易上手。我假设您对通常所说的intel语法的汇编语言指令有一定的了解,好比在x86手册里描述的那样。因为它相对比较简单,我用NASM(Netwide Assemble) INTEL语法的变体来引用一下它们之间的语法格式上的差别。
The GNU assembler is a part of the GNU Binary Utilities (binutils), and a back-end to the GNU Compiler Collection. Although as is not the preferred assembler for writing reasonably big assembler programs, its a vital part of contemporary Unix-like systems, especially for kernel-level hacking. Often criticised for its cryptic AT&T-style syntax, it is argued that as was written with an emphasis on being used as a back-end to GCC, with little concern for "developer-friendliness". If you are an assembler programmer hailing from an INTEL-Syntax background, you'll experience a degree of stifling with regard to code-readability and code-generation. Nevertheless, it must be stated that, many operating systems' code-base depend on “as” as the assembler for generating low-level code.
The GNU 汇编是GNU 二进制工具包(binutils)的一部分,也是基于后GNU编译器的集合。尽管它不是编写大型的汇编程序首选,但它仍是当代Unix-like 系统的重要组成部分,特别对于内核的黑客们来说。它神秘的AT&T-Style 的语法经常遭到批评,这种批评源于它用于强调编写GCC的后端,而对于开发友好不是很关心。所以如果你是来自于INTEL语法背景的汇编语言程序员,你将感觉非常闹心(窒息)在代码可读性和代码生成方面。然而,不得不说的是,很多操作系统的代码库依赖 AS(GNU assembler)来生成低层(很LOW)的代码。
The Basic Format [基础语法格式]
The structure of a program in AT&T-syntax is similar to any other assembler-syntax, consisting of a series of directives, labels, instructions - composed of a mnemonic followed by a maximum of three operands. The most prominent difference in the AT&T-syntax stems from the ordering of the operands.
AT&A汇编程序的语法结构与其他汇编的语法很类似,都包括指令序列,标签,指令包含助记符跟最多三个操作数。最显著的不同在AT&A语法源于操作的顺序。
For example, the general format of a basic data movement instruction in INTEL-syntax is,
例子:在INTEL中普通的数据移动指令如下
mnemonic destination, source
(助记符) 目标操作数,源操作数
whereas, in the case of AT&T, the general format is
然而,在AT&T的场景下,格式为
mnemonic source, destination
(助记符) 源操作数,目标操作数
To some (including myself), this format is more intuitive. The following sections describe the types of operands to AT&T assembler instructions for the x86 architecture.
对于大多数人来说(包括我自己)这个语法格式非常直观,下面介绍在X86体系结构中AT&A汇编指令的操作数类型。
Registers (寄存器)
All register names of the IA-32 architecture must be prefixed by a '%' sign, eg. %al,%bx, %ds, %cr0 etc.
在IA-32体系结构中的所有的寄存名必须以‘%’作为前缀(以%开头),如 %al,%bl,$ds,%cr0 等等。
mov %ax, %bx
The above example is the mov instruction that moves the value from the 16-bit register AX to 16-bit register BX.
上面这个例子是"mov"指令,将16位的寄存器AX的内容移至BX中
Literal Values 常量
All literal values must be prefixed by a '$' sign. For example,
所有常量必须以'$'开头,例如:
mov $100, %bx
mov $A, %al
The first instruction moves the the value 100 into the register AX and the second one moves the numerical value of the ascii A into the AL register. To make things clearer, note that the below example is not a valid instruction,
第一个指令将100移至寄存器AX中,第二个指令将ASCII A的值移至AL寄存器中。
为了描述更清楚,注释一下,下边的例子是不合法的
mov %bx, $100
as it just tries to move the value in register bx to a literal value. It just doesn't make any sense.
它只是尝试将寄存器bx中的值移至常量值中,这个指无任何意义。
Memory Addressing 内存地址
In the AT&T Syntax, memory is referenced in the following way,
在AT&T的语法当中,内存的引用以下面的方式出现:
segment-override:signed-offset(base,index,scale)
parts of which can be omitted depending on the address you want.
%es:100(%eax,%ebx,2)
Please note that the offsets and the scale should not be prefixed by '$'. A few more examples with their equivalent NASM-syntax, should make things clearer。
请注意这里的offsets和scale 不应该以$开头,一些和INTEL 语法对比的例子,可能会解释的更清楚。
GAS memory operand | NASM memory operand |
---|---|
100 | [100] |
%es:100 | [es:100] |
(%eax) | [eax] |
(%eax,%ebx) | [eax+ebx] |
(%ecx,%ebx,2) | [ecx+ebx*2] |
(,%ebx,2) | [ebx*2] |
-10(%eax) | [eax-10] |
%ds:-10(%ebp) | [ds:ebp-10] |
Example instructions,
mov %ax, 100
mov %eax, -100(%eax)
The first instruction moves the value in register AX into offset 100 of the data segment register (by default), and the second one moves the value in eax register to [eax-100].
第一个指令将AX的值,移至数据段(默认)的偏移量为100的地址中,第二个将eax中的值移至EAX-100 的物理地址中.
Operand Sizes
At times, especially when moving literal values to memory, it becomes neccessary to specify the size-of-transfer or the operand-size. For example the instruction,
有时,特别是将常量值移给内存的时侯在,确认操作数的size将变得非常必要,举个指令的例子:
mov $10, 100
only specfies that the value 10 is to be moved to the memory offset 100, but not the transfer size. In NASM this is done by adding the casting keyword byte/word/dword etc. to any of the operands. In AT&T syntax, this is done by adding a suffix - b/w/l - to the instruction. For example,
只需要将10移至内存(offset 100)中,但没有指定操作数size,在NASM中这种转换被加到了类型转换关键字(byte/word/dword),那么在AT&T语法中,对于任务操作数,这种转换变为给指令加上对应的后缀(b/w/l),举个例子:
movb $10, %es:(%eax)
moves a byte value 10 to the memory location [ea:eax], whereas,
movl $10, %es:(%eax)
moves a long value (dword) 10 to the same place.
A few more examples,
movl $100, %ebx
pushl %eax
popw %ax
Control Transfer Instructions 控制跳转指令
The jmp, call, ret, etc., instructions transfer the control from one part of a program to another. They can be classified as control transfers to the same code segment (near) or to different code segments (far). The possible types of branch addressing are - relative offset (label), register, memory operand, and segment-offset pointers.
jmp,call,ret 等指令跳转控制代码从一个部分跳到另一个部分,它可以被归类为相同代码段的跳转(near)或者不同代码内(far)的中转。分去地址可能的类型为
- 相对的偏移量
- 寄存器
- 内存操作数
- 段偏移量指针
Relative offsets, are specified using labels, as shown below.
相对偏移量被指定为标签,如下所示
label1:
.
.
jmp label1
Branch addressing using registers or memory operands must be prefixed by a '*'. To specify a "far" control tranfers, a 'l' must be prefixed, as in 'ljmp', 'lcall', etc. For example,
分支地址用寄存器或内存操作数必须以*作为前缀
- 指定为far 必须以'l' 作为前缀
GAS syntax | NASM syntax |
---|---|
jmp *100 | jmp near [100] |
call *100 | call near [100] |
jmp *%eax | jmp near eax |
jmp *%ecx | call near ecx |
jmp *(%eax) | jmp near [eax] |
call *(%ebx) | call near [ebx] |
ljmp *100 | jmp far [100] |
lcall *100 | call far [100] |
ljmp *(%eax) | jmp far [eax] |
lcall *(%ebx) | call far [ebx] |
ret | retn |
lret | retf |
lret $0x100 | retf 0x100 |
Segment-offset pointers are specified using the following format:
段编移量指针会用于如下格式:
jmp $segment, $offset
For example:
jmp $0x10, $0x100000
If you keep these few things in mind, you'll catch up real soon. As for more details on the GNU assembler, you could try the documentation.
如果你把这些记住,你将很快熟悉AT&T 汇编语法,更多详情请看更多文档(略)