libasm

All the explanations in this file follow the System V AMD64 ABI calling convention.

Apple Silicon 
Prerequisites and basic concepts 📚
- Registers 📦
- Execution flow ⏩
- Syntax 💻
  - Instructions 🤖
  - Behavior 🔄
  - Addresses and values 📍
  - Exporting symbols 📤
ft_strlen
ft_strcpy
ft_strcmp
ft_write
ft_read
ft_strdup
Resources 📖

Apple Silicon

Apple Silicon is based on ARM64 architecture. The assembly code in this repository is written for x86_64 architecture.

If you wish to use and test your assembly code seamlessly on those chips, you need to add the following lines to your Makefile:

CC = gcc
ifeq ($(shell uname -m), arm64)
	CC += -ld_classic --target=x86_64-apple-darwin
endif

Instead of rewriting all the assembly code for ARM64 architecture, we basically compile our C code in x86_64 architecture as it can be executed through Rosetta 2.

Don't forget the -f macho64 flag after your nasm command when compiling on Macs.

Prerequisites and basic concepts

Registers

Assembly works mainly with registers. They can be compared to little pre-defined boxes that can store data.

The general-purpose registers are:

rax
rcx
rbx
rdx
rsi
rdi
rsp
rbp
r8, r9, ...r15
rip

We have to be careful when putting data in those registers, because they can be used by the system at any time (read or written).

Among the above registers for example:

rax is used to store the return value of a function.
rcx is used as a counter in loops.
rsp is used to store the stack pointer.
rbp is used to store the base pointer.
rdi, rsi, rdx, rcx, r8, r9 are used respectively to pass arguments, vulgarly like function(rdi, rsi, rdx, rcx, r8, r9)

This may seem like an inconvenience, but it is actually very useful to manipulate the behavior of the program.

For instance, if we want to call sys_write:

We put the syscall number (4 for sys_write) in rax.

We put the file descriptor in rdi.

We put the address of the buffer in rsi.

We put the number of bytes to write in rdx.

We call the syscall instruction.

The conclusion is: the little boxes are very useful as intermediate storage for data, but we have to be careful not to overwrite them when we (or the system) need them.

Execution flow

Assembly is read from top to bottom.

The instructions can be grouped in labels, which are used to mark a specific point in the code. They are followed by a colon.

One thing that can be confusing is that although labels look like functions in other languages like C, they are not.

For example:

entry_point:
	xor rax, rax

do_something:
	mov rax, 0

do_something_else:
	mov rax, 1

return_label:
	ret

In this example, entry_point is the entry point of the program/function.

do_something and do_something_else will be executed one after the other, even without a "jump" instruction.

Syntax

The syntax used in this repository is the Intel syntax. It is the most common syntax used in assembly programming, and a requisite of the subject. It is characterized by the fact that the destination operand is on the left and the source operand is on the right.

Instructions

Virtually all the lines in assembly are composed of an instruction followed by its operand(s).

A few examples:

mov rax, 0 copies the value 0 into the register rax.
add rax, 1 adds 1 to the value in the register rax.
cmp rax, 0 compares the value in the register rax with 0.
jmp do_something jumps to the label do_something.

Behavior

It is very important to remember that every instruction can alter the behavior of the program implicitly.

For example:

The cmp instruction will set the flags register according to the result of the comparison.
The loop instruction will decrement the rcx register and jump to the label if rcx is not zero.

Addresses and values

Like in C, we can work with addresses.

The square brackets [] are used to dereference an address.

For example, if we want to move the value at the address 0x1234 into the register rax, we can do:

mov rax, [0x1234]

If we want to compare the address 3 bytes after the address in rax with 0, we can do:

cmp [rax + 3], 0

Here we should technically use an identifier for the address (BYTE, WORD, DWORD, QWORD) to specify the size of the data we want to compare, but we ignored it for the sake of this explanation.

Exporting symbols

In order to use the functions we write in assembly in a C program, we need to export them.

To do so, we can use the global directive.

For example, if we want to export the function ft_strlen, we can do:

global ft_strlen

ft_strlen:
	...

ft_strlen

The ft_strlen function is a function that returns the length of a string. It is a very simple function that iterates over the string until it finds the null-terminator (\0).

To implement it in assembly, we need to recapitulate the behavior of the function:

Set a counter to 0.
Look at the first character of the string.
Increment the counter.
Look at the next character.
If it is not the null-terminator, increment the counter and go back to step 4.
If it is the null-terminator, return the counter.

To replicate this behavior in assembly, we will need to learn a few instructions:

mov to copy data.
jmp to jump.
cmp to compare data.
je to jump if equal.
inc to increment a register.
ret to return from the function.

`mov`

mov rax, rdi

This instruction copies the value in rdi to rax.

`jmp`

entry_point:
	jmp some_other_label

some_label:
	...

some_other_label:
	...

The first line of entry_point will jump to the label some_other_label (and skip some_label) regardless of the condition.

Jumps work like a goto in C. They can be used to skip parts of the code, or to create loops (as they can jump to a label that is located earlier in the code).

`cmp`

cmp rax, rdi

This instruction compares the value in rax with the value in rdi.

If we want to check if rax is equal to rdi, we can do:

cmp rax, rdi
je equal

Which leads us to the next instruction.

`je`

cmp some_register, some_other_register
je equal

This instruction will jump to the label equal only if the two registers are equal.

`inc`

inc rax

This instruction increments the value in rax by 1. Simple.

`ret`

ret

This instruction returns from the program/function.

Remember that in the System V AMD64 ABI, the return value of a function is stored in the rax register. Whatever is in rax when we call ret instruction will be the return value of the program/function.

Implementation

Now that we know the instructions we need, we can implement the ft_strlen function.

The first thing we need to do is to set the counter to 0. We can do this by moving 0 to rcx (or any other register, but remember rcx is commonly used as a counter).

mov rcx, 0

Then, we need to define our recurring loop.

We need to look at every character in the string passed in rdi (where is passed the first argument in this calling convention, as seen before).

So, rdi first points at the first character of the string, and rcx is our counter initialized to 0.

Like we would look into *(str + i) in C, we can use assembly's square brackets [] as follows:

cmp [rdi + rcx], 0

What happens next?

If the character is not the null-terminator, we need to increment the counter and go back to the beginning of the loop.
If it is the null-terminator, we need to return the counter.

So, with the instructions we learned before, we can use:

cmp to compare the character with 0
je to jump to the end of the function if it is the null-terminator.
inc to increment the counter.
jmp to go back to the beginning of the loop.

Our loop would then look like:

loop:
	cmp [rdi + rcx], 0
	je end
	inc rcx
	jmp loop

Finally, we need to return the counter. We can do this by moving the value in rcx to rax and returning.

Remember, that "whatever is in rax when we call ret instruction will be the return value of the program/function."

end:
	mov rax, rcx
	ret

And that's it! We have implemented the ft_strlen function in assembly.

ft_strcpy

The ft_strcpy function is a function that copies a string into another string, and returns a pointer to the destination string.

The logic is very similar to the ft_strlen function:

Set a counter to 0.
Look at the first character of the source string.
Copy it to the destination string.
Increment the counter.
Look at the next character and copy it.
If it is not the null-terminator, increment the counter and go back to step 5.
If it is the null-terminator, exit the loop and return a pointer to the destination string.

Implementation

A pointer to our dst string is passed in rdi, and a pointer to our src string is passed in rsi.

We can start the same way we did with ft_strlen by setting the counter to 0.

ft_strcpy:
	mov rcx, 0

We can then start our loop.

We need to copy every character in rsi to rdi. However, in assembly, we can't copy data directly from one address to another (mov [rdi], [rsi] would not work).

We therefore need to copy the data from the source address to a register, and then copy it to the destination address.

We could use any register to store the character (like r8 as seen before), but it is more appropriate to use al for this purpose, as it is a register that is meant to store a single byte.

loop:
	mov al, [rsi + rcx]
	mov [rdi + rcx], al
	inc rcx
	cmp al, 0
	jne loop

In this loop:

We copy the character in rsi to al.
We copy al to rdi.
We increment the counter.
We check if the character is the null-terminator.
If it is not, we go back to the beginning of the loop.

Finally, we need to return the pointer to the destination string.

Given that we received this pointer in rdi and that we did not move it, we can simply copy it to rax and return.

mov rax, rdi
ret

ft_strcmp

The ft_strcmp function is a function that compares two strings, and returns the difference between the first two different characters. For example, "abc" and "abd" would return -1, as 'c' - 'd' = -1.

Its logic is not much more complex than the previous functions, but the particularities of assembly make it a bit more challenging (for me at least, maybe I have a shitty logic).

Let's recap the behavior of the function anyway:

Set a counter to 0.
Compare every character of the first string with every character of the second string.
If they are different, substract the second character from the first character and return the result.

From this exercise on, I will only explain the new instructions and concepts we use.

`sub`

sub rax, rdi performs a substraction such as rax ← rax - rdi.

`movzx`

movzx rax, BYTE[rdi + rcx] moves the byte at the address rdi + rcx to rax, and fills the remaining bits with 0.

movzx will adapt to the keyword used to specify the size of the data we want to move. For example, movzx rax, WORD[rdi + rcx] would move a word (16 bits) to rax.

This instruction is useful to us as rax is a 64-bit register, and we only want to compare the characters as bytes (8 bits).

`jz`

jz label jumps to label if the zero flag is set, a bit like je and jne but for the zero flag.

For example, if we want to jump to end if one of register1 or register2 is 0, we can do:

cmp register1, 0
cmp register2, 0
jz end

Implementation

I will not show code like in the previous ones, just describe the logic with more depth.

My intermediates registers will be rax and r8 (not the most efficient code but easier to understand).

We set rcx, rax and r8 to 0.
We start our loop.
We copy rdi and rsi to rax and r8.
We check that the characters are not the null-terminator.
We compare the characters.
If they are different, we substract them and return.
If not, we increment the counter and go back to the beginning of the loop.

That's it! With all the precautions I mentioned and the new instructions, you should be able to implement the ft_strcmp function.

ft_write

The ft_write function is a function that, provided a file descriptor, a buffer and a size, writes the buffer to the file descriptor. It returns the number of bytes written, or -1 if an error occurred.

To implement this function, we need to know how to make a syscall.

Syscalls

As we saw in the introduction, to make a syscall, we need to:

Put the syscall number in rax.
Put its arguments in the appropriate registers (here rdi, rsi and rdx).
Trigger the syscall with the syscall instruction, that will read the values in the registers and behave accordingly.

First steps

In our case, the parameters we receive from C are already in the right registers, so we can save the following instructions:

mov rdi, rdi ; file descriptor
mov rsi, rsi ; buffer
mov rdx, rdx ; count

We can then put the syscall number in rax and call the syscall.

On Apple Silicon Macs, the syscall number for sys_write is 0x2000004. On Linux, it is 1.

mov rax, 0x2000004
; mov rdi, rdi
; mov rsi, rsi
; mov rdx, rdx
syscall
ret

We could stop here, as the function would properly write to the file descriptor and return the number of bytes written (the syscall putting its return value in rax).

However, we need to return -1 if an error occurred, and set the errno variable accordingly, as asked in the subject.

To handle any error and jump to another label, we can use the jc instruction. This will jump to the label if the Carry Flag is set, which is generally the case when an error occurs.

jc error

`errno`

errno is a variable that is set when an error occurs. It is a global variable that is set by the system when a syscall fails.

However, it is not automatically translated to the errno variable in C. We need to set it ourselves.

To do so, we are provided with __errno_location (or ___error on Mac). This function returns a pointer to the errno variable.

We can call it with the instruction call __errno_location. As for other instructions, it will put the return value in rax.

However, rax already contains the return value of the write syscall. We need to save it before calling __errno_location.

error:
	mov r8, rax
	call __errno_location

We now have the address of the errno variable in rax. We can put the previously saved error code in it by dereferencing the address.

mov [rax], r8

Finally, we can return -1 and exit the function.

mov rax, -1
ret

And that's it! We have implemented the ft_write function.

ft_read

The ft_read function is a function that, provided a file descriptor, a buffer and a size, reads from the file descriptor to the buffer. It returns the number of bytes read, or -1 if an error occurred.

It's literally implementing the ft_write function but with the sys_read syscall.

Same logic, same instructions, same everything.

So we just replace 0x2000004 with 0x2000003 and we're good to go.

Easy.

ft_strdup

The ft_strdup function is a function that duplicates a string. It allocates memory for the new string, copies the string to it, and returns a pointer to the new string.

Doesn't sound as easy as ft_read.

Luckily, we can use the functions we implemented before, and the malloc function. But we also need to learn two new instructions.

`push` and `pop`

We have another way to store data in assembly: the stack.

The stack is literally a pile of data, organized in a LIFO (Last In, First Out) way.

If I push 1, 2 and 3 on the stack, it will look like:

3
2
1

If I pop the stack here, I will get 3.

In assembly, we can manipulate the stack as follows:

push rax
push rbx

This instruction will push the value of rax, then the value of rbx on the stack.

value of rbx
value of rax

We then have the pop command, that will take the last value pushed on the stack and put it in the operand we specify.

pop any_register

After this instruction, any_register will contain the value of rbx.

Implementation

In this implementation, we will use push and pop to save the values of our registers when calling other functions that manipulate them.

Let's recap the steps of the function:

Get the length of the string (ft_strlen).
Allocate memory for the new string (malloc).
Copy the string to the new string (ft_strcpy).

We also know that:

When ft_strdup is called, rdi contains *s (the string to duplicate).
ft_strlen reads a string in rdi and returns the length in rax.
malloc reads the size in rdi and returns a pointer to the allocated memory in rax.
ft_strcpy reads the source string in rsi and the destination string in rdi.

So:

We don't need to touch rdi before calling ft_strlen.
We call ft_strlen, that saves the length in rax.
We push the rdi (that contains *s) to the stack, to use it later.
We increment the length by 1 (for the null-terminator).
We move the length of *s to rdi to call malloc with the right size.
We call malloc, that returns a pointer to the allocated memory in rax.
We pop the *s from the stack to rsi (second argument of ft_strcpy).
We move the pointer to the allocated memory to rdi (first argument of ft_strcpy).
We call ft_strcpy, that copies the string to the allocated memory.
We can directly call ret as the pointer to the new string is already in rax.

And that's it! We have implemented the last mandatory function of the project.

Changes to make on Linux

The syscall number for sys_write is 1 on Linux.
The syscall number for sys_read is 0 on Linux.
The ___error function is named __errno_location on Linux.
The symbols don't need to be prefixed with an underscore on Linux.
The Carry Flag is not set when an error occurs on Linux. We need to check if rax is negative to detect an error.
The nasm flag -f elf64 should be used instead of -f macho64.

leogaudin/libasm

libasm

Table of Contents

Apple Silicon

Prerequisites and basic concepts

Registers

Execution flow

Syntax

Instructions

Behavior

Addresses and values

Exporting symbols

ft_strlen

`mov`

`jmp`

`cmp`

`je`

`inc`

`ret`

Implementation

ft_strcpy

Implementation

ft_strcmp

`sub`

`movzx`

`jz`

Implementation

ft_write

Syscalls

First steps

`errno`

ft_read

ft_strdup

`push` and `pop`

Implementation

Changes to make on Linux

Resources

leogaudin/libasm

libasm

Table of Contents

Apple Silicon

Prerequisites and basic concepts

Registers

Execution flow

Syntax

Instructions

Behavior

Addresses and values

Exporting symbols

ft_strlen

mov

jmp

cmp

je

inc

ret

Implementation

ft_strcpy

Implementation

ft_strcmp

sub

movzx

jz

Implementation

ft_write

Syscalls

First steps

errno

ft_read

ft_strdup

push and pop

Implementation

Changes to make on Linux

Resources

`mov`

`jmp`

`cmp`

`je`

`inc`

`ret`

`sub`

`movzx`

`jz`

`errno`

`push` and `pop`