The wonderful world of assembly
This repository tries to give an overview and general feel for the family of assembly languages. It doesn't aim to be a precise reference for a certaint dialet (there are plenty of good cheat sheets out there). This summary tries to give an intuition for common patterns and some "experience" of what is out there.
In the directory src
you can find a collection of examples. Most of the time, you can run an example with $ gcc -m32 -o example example.s
.
variables
Local variables
Stored on the stack. Compiler doesn't know absolute value, uses frame pointer to resolve relative address. Referenced by negative offsets from base pointer.
Parameters
Referenced by positive offsets from base pointer.
Registers
Naming
src: http://www.cs.princeton.edu/courses/archive/spring04/cos217/lectures/IA32-I.pdf
example:
- eax: 32-bit general-purpose register
- ax: 16-bit (lower half)
- ah, hl: 8-bit registers (high and low)
%eax, %ax, %ah, %al
Accumulator for operands, results. Return values often end up here.
%ebx, %bx, %bh, %bl
Pointer to data in the DS segment.
%ecx, %cx, %ch, %cl
Counter for string, loop operations
%edx, %dx, %dh, %dl
I/O pointer
%esi, %si
Pointer to DS data, string source
%edi, %di
Pointer to ES data, string destination
Base pointer = BP, Frame Pointer = FP, %ebp, %bp
Pointer to data on the stack. Points to the previous frame's base pointer. Points to the base of the current stackframe: The base of the stack frame is not the first "item" belonging to the frame (like a parameter), but think of base as a reference point from which you can get to all the data of the function by using relative addresses (positive ones for parameters, negative ones for local variables etc.). Reference point on stack to get local variables and function parameters in current stack frame. Only manipulated explicitly.
Stack pointer, %esp
Points to the last value that was stored, under the assumption that its size will match the operating mode of the processor. A push decreases the value, a pop increases the pointer.
Instruction pointer, %rip
Programm Counter, Instruction Pointer, EIP
Changed by:
- unconditional jump
- conditional jump
- procedure call
- return
commands
v
A simple value: 0xdeadbeef (The number), %eax (content of the register)
memory operand (v), [v]
Value at memory location v, except at lea instruction.
Examples:
mov eax, [ebx]
mov eax, [ebx+3]
movl (%ebx), %eax
movl 3(%ebx), %eax
Intel syntax: segreg:[base+index*scale+disp]
AT&T syntax: %segreg:disp(base, index, scale)
optional: index, scale, disp, segreg
direction of operands: src,dst or dst,src?
intel syntax: dest,source AT&T syntax: source,dest
push v
Write v to stack, decrease stack pointer.
pop v
Take top of the stack and write into v, increase stack pointer.
### mov a l Computes addres a and stores value at a into location l.
call v
Call the function v, return value in eax.
lea a l, Load Effective Address
Computes address a and stores it into location l.
AT&T suffixes
- l: long
- w: word
- b: byte
src: http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm
registers
Syntax
Statement format
One statement per line. Generally:
[label] mnemonic [operands] [;comment]
https://www.tutorialspoint.com/assembly_programming/assembly_basic_syntax.htm
Constants
Sometimes 1
, sometimes $1
.
Registers
Sometimes AH
, sometimes eax
, sometimes %eax
.
Commands
Sometimes MOV
, sometimes mov
.
Comments
Sometimes with ;
, sometimes with #
.
Examples intel syntax
mov eax, 1
mov ebx, 0ffh
int 80h
Examples AT&T Syntax
movl $1, %eax
movl $0xff, %ebx
int $0x80
procedures
procedure call, caller-side
Different types of data can be passed at different locations. Memory (large stuff or things without a "standard" size) often is passed on the stack by pushing them backwards. example:
function(1,2,3);
gives:
pushl $3
pushl $2
pushl $1
call function
Integers often are passed in registers: %rdi, %rsi, %rdx, %rcx, %r8, %r9
Vectors are passed in %xmm0 to %xmm7.
procedure call, callee-side = procedure prolog
save previous FP (resotre at procedure exit) Copy SP into FP. Advance SP to reserve space for the local variables. http://insecure.org/stf/smashstack.html
pushl %ebp ; push frame pointer onto stack
movl %esp, %ebp ; copy sp to bp, making it the new frame poitner
subl $20, %esp ; allocate space for local variables
- Function parameters pushed to stack in reverse order.
- Push EIP return address.
- prolog: push %ebp
- prolog: %esp assigned to %ebp
- Allocate space for local variables (subtract size to %esp)
procedure epilog
- Exit procedure
- clean up stack
- Pop %ebp
- %ebp assigned to %esp
System calls for macOS
#define SYS_syscall 0
#define SYS_exit 1
#define SYS_fork 2
#define SYS_read 3
#define SYS_write 4
#define SYS_open 5
#define SYS_close 6
#define SYS_wait4 7
Sources
http://www.fabiensanglard.net/macosxassembly/index.php http://www.idryman.org/blog/2014/12/02/writing-64-bit-assembly-on-mac-os-x/ http://insecure.org/stf/smashstack.html http://www.imada.sdu.dk/Courses/DM18/Litteratur/IntelnATT.htm