minitalk

A minimalistic implementation of a small data exchange program using UNIX signals. 

Table o'Contents

About 📌
- Mandatory Features
- Bonus Features
Implementation 📜
Usage 🏁
Testing 🧪
Appendix 📖
- Unicode Character Encoding
License

About 📌

The goal of this project is to develop a client-server communication program using UNIX signals only.

Mandatory Features

The mandatory implementation must behave as follows:

First the server must be started, which will generate a pid and print it to stdout.
The client should accept two parameters:
- The pid of the server;
- The message to be sent;
The client must send the message passed in as a parameter to the server.
Upon receiving the message the server must print it to stdout almost instantly.
The server must be able to receive messages from several different clients in a row without the need for a restart. (Note that Linux systems do NOT queue signals when a signal of the same type is already pending).
client-server communication must be done using SIGUSR1 and SIGUSR2 signals only.

Bonus Features

The server acknowledges receiving a message by sending back a signal to the client.
Support Unicode characters.

Implementation 📜

For this project I chose to implement both mandatory and bonus features together. The server and client can be found in the server.c and client.c files inside the src folder plus two additional files ft_sigaction.c and ft_send.c containing helper functions.

`t_protocol`

For the sake of simplicity the program uses a custom data type t_protocol which holds all the data the server needs to perform its operations:

typedef struct s_protocol
{
	int  bits;     // Number of bits received
	int  data;     // Received data (One integer and a sequence of chars)
	int  received; // Flag indicating if "header" data has been received
	char *msg;     // Received message
}	t_protocol;

`server.c`

To implement the server's signal handling functionality I chose to use sigaction() over signal().

This is because signal() is deprecated due to its varying behaviour across UNIX versions, making it a non-portable option.

Important

Both functions listen for a user defined signals and change their default signal actions. The main difference between these functions is that sigaction() employs a specialized struct to store extra information, giving the user finer control over what they can do when handling a signal.

Initializing `sigaction`

The server's main() function declares and initializes a struct sigaction variable called sa.

struct sigaction	sa;

sigemptyset(&sa.sa_mask);
sa.sa_sigaction = ft_server_sighandler;
sa.sa_flags = SA_SIGINFO | SA_RESTART;

sa.sa_mask specifies a mask of signals that should be ignored;
We use sigemptyset() to initialize a signal set sa.sa_mask with all signals excluded from the set;
sa.sa_sigaction is set to the function ft_server_sighandler();
sa.sa_flags flag set has the bits for SA_SIGINFO and SA_RESTART turned on;

Note

SA_SIGINFO : gives the user access to extended signal information; This flag makes sigaction() switch where it looks for the custom signal handler, changing it from the sa.sa_handler member to sa.sa_sigaction.
SA_RESTART : provides BSD compatible behaviour allowing certain system calls to be restartable across signals.

The sa struct is then passed to ft_set_sigaction() to initialize event handling for SIGUSR1 and SIGUSR2 signals.

ft_set_sigaction(&sa);

Note

See ft_sigaction for more details on what sigaction() does.

Then the server prints its pid to stdout and enters an infinite loop, listening for a signal to catch.

ft_print_pid();
while (1)
	pause();

`ft_server_sighandler()`

static void	ft_server_sighandler(int sig, siginfo_t *info, void *context);

Any time SIGUSR1 or SIGUSR2 signal is received, ft_server_sighandler() is called.

All its local variables are static, therefore automatically initialized to 0, except for client_pid, which we initialize to -1 to mean an error condition.

static t_protocol   server;
static int          i;
static pid_t        client_pid = -1;

usleep(PAUSE);
(void)context;

The server signal handler waits for PAUSE (100 microseconds) before it starts receiving data.
context is type cast to void * to suppress compiler warnings since we do not need to use it.

if (client_pid == -1)
    client_pid = info->si_pid;
else if (client_pid != info->si_pid)
{
    if (server.msg)
        free(server.msg);
    ft_perror_exit("Client PID does not match\n");
}

If client_pid is -1, it means that the server hasn't connected to a client yet so the program sets client_pid to info->si_pid, the pid of the client currently connecting.
If client_pid is not equal to the pid of the current client, the server frees the allocated message and prints an error and exits.

if (!server.bits)
	server.data = 0;

If server.bits is 0, it means that the server has not received any data yet so the program sets server.data to 0 to prepare to receive the incoming data.

Receiving Data

The server first receives an integer as "header information" specifying the length in bytes of the message about to be transferred, then come the actual bits of the message.

To store the bits according to the data type being received the following bitwise operations and conditionals are employed:

if ((sig == SIGUSR2) && !server.received)
	server.data |= 1 << (((sizeof(int) * 8) - 1) - server.bits);
else if ((sig == SIGUSR2) && server.received)
	server.data |= 1 << (((sizeof(char) * 8) - 1) - server.bits);

The conditional statements make sure that the first 32 bits of incoming data are saved in a space that fits an int.
The bitwise operators | (OR) and << (Left-Shift) are used together to set the received bits in their right place in memory.
After this int is received the server starts storing the following inbound bits into char sized chunks of memory.

Note

These memory-writing bitwise operations only happen when a SIGUSR2 is received.

Any time a SIGUSR1 is caught, the server simply acknowledges by sending back a SIGUSR1 to the client.

Because the memory in server.data is initially set to 0, the server only needs to act when a 1 is received, and flip the appropriate bit in its right place in memory.

Important

SIGUSR1 and SIGUSR2 are therefore used to signify 0 and 1 respectively.

`ft_strlen_received()`

static void	ft_strlen_received(t_protocol *server);

Once the int has been received the conditions for triggering the code block inside ft_strlen_received() are met:

if ((server->bits == (sizeof(int) * 8)) && !server->received) { ... }

This function first sets the server.received flag to 1, signifying that the header data has been received.

The server prints the length of the message to stdout, then takes this value plus 1 (to account for the NULL terminator) and allocates memory for a message with that many bytes with ft_calloc() so that the memory is all set to zero:

server->msg = ft_calloc((server->data + 1), sizeof(char));
if (!server->msg)
	ft_perror_exit("ft_calloc() failed\n");

The memory space for the message is then NULL terminated, and the server->bits are reset to 0 to prepare the server to receive the bits of the message.

server->msg[server->data] = '\0';
server->bits = 0;

The function ends and the server continues receiving the message bit by bit until every char in the message has been transferred successfully.

`ft_print_msg()`

static void	ft_print_msg(t_protocol *server, int *i, pid_t *pid);

Once 8 bits have been received and the header information has already been transferred, the first layer of logic is triggered:

if ((server->bits == 8) && server->received) { ... }

The received byte stored in server.data is copied to the i-th index of server->msg.

Then i is incremented so that when indexed server->msg[i] points to the next byte in memory where the next char or Unicode segment (code point) is gonna be stored.

server->msg[*i] = server->data;
++(*i);

Important

For more about Unicode check the Appendix, Unicode Character Encoding.

Notice that server.bits is reset to 0 after the char has been stored, in preparation to receive the next.

server->bits = 0;

And so the server receives each byte of the message until the whole message has been received.

Printing the Message

The server knows all the data in the message has been received when the current server.data value is the NULL terminator.

if (server->data == '\0') { ... }

The server then prints the message to stdout followed by the server's pid.

ft_printf("Message:\n%s%s%s\n", GRN, server->msg, NC)
ft_print_pid();

Now the server performs some clean up to prepare to receive the next message.

free(server->msg);
server->msg = NULL;
server->received = 0;
*i = 0;
*pid = -1;

Since we are done with the server.msg, we free the memory space allocated to store it.
We set the server->msg pointer to NULL.
And set i and server->received flag to 0.
We reset the pid to -1 to prepare the server to receive the next message from a different pid.

ft_send_bit(pid, 1, 0);

Finally we send a bit back to the client to signal that the message has been received.

`client.c`

Before starting operations the client must check if its input arguments are valid.

if (argc != 3)
	ft_perror_exit("Usage: ./client [PID] [message]\n");
else if (kill(ft_atoi(argv[1]), 0) < 0)
	ft_perror_exit("PID does not exist\n");

It first checks if argc is not equal to 3, if so the program will print an error to stderr and exit.
Then checks if the pid of the server (argv[1]) is valid by test-calling kill() (with a zero instead of a signal identifier).
If it is NOT valid the program will also print an error to stderr and exit.

Initializing the Client's `sigaction`

The client, like the server, uses sigaction() to handle incoming UNIX signals:

struct sigaction	sa;

sigemptyset(&sa.sa_mask);
sa.sa_handler = ft_client_sighandler;
sa.sa_flags = SA_RESTART;
ft_set_sigaction(&sa);

A struct sigaction is declared as sa and:

Initializes its signal set sa.sa_mask with all signals excluded from the set using sigemptyset();
sa.sa_handler is set to the function ft_client_sighandler();
sa.sa_flags flag set has the bit for SA_RESTART turned on;
The sa struct is then passed into ft_set_sigaction() to set event handling for SIGUSR1 and SIGUSR2;

When the client event handler receives a signal, it checks if it is SIGUSR1 or SIGUSR2:

If the incoming signal is SIGUSR1 (Data Reception Acknowledgement), it prints a * to stdout.

Else if it receives SIGUSR2 (Data Transmission Done), it prints a success message to stdout and exits.

The client then prints the server's pid to stdout and calls ft_send_msg():

ft_print_pid();
...
ft_send_msg(ft_atoi(argv[1]), argv[2]);

`ft_send_msg()`

static void ft_send_msg(pid_t pid, char *msg);

To keep track of the current index of the message being sent, a local integer variable i is created and initialized to 0.
Before sending the message we must first take the message's length into the integer variable msglen.

int i;
int msglen;

i = 0;
msglen = ft_strlen(msg);
ft_printf("%sOutbound msg's length = %d%s\n", CYN, msglen, NC);

The message length is bit-by-bit using the function ft_send_int.

ft_send_int(pid, msglen);

Then it loops through the message and sends each character to the server bit-by-bit:

ft_printf("\n%sSending Message%s\n", GRN, NC);
while (msg[i] != '\0')
	ft_send_char(pid, msg[i++]);
ft_printf("\n");
ft_sep_color('0', '=', 28, GRN);
ft_printf("%sSending NULL Terminator\n", MAG, NC);
ft_sep_color('0', '=', 28, GRN);

Then all there's left to do is to send a NULL terminator to the server and terminate the message appropriately:

ft_send_char(pid, '\0');

`ft_send.c`

To send chars and ints to the server two helper functions were implemented: ft_send_char() and ft_send_int().

`ft_send_char()` & `ft_send_int()`

void	ft_send_int(pid_t pid, int num);
void	ft_send_char(pid_t pid, char c);

These two functions work in similar ways.

They first initialize a bitshift integer variable with the size of the binary representation of the data type about to be sent:

int		bitshift;

bitshift = ((sizeof(int) * 8) - 1);  // Prepare the server to receive 32 bits
...
bitshift = ((sizeof(char) * 8) - 1); // Prepare the server to receive 8 bits

Important

bitshift will be used to iterate through each byte of data being sent from the most significant (MSB) to the least significant bit (LSB).

Sending Data

The client enters a loop running from bitshift to 0:

It breaks the char/int into its individual bits;
Each bit is passed as an argument to ft_send_bit() where it triggers the appropriate signal and is sent to the server;
bitshift is decremented to move to the next bit of the binary representation of the char/int, from left to right;

while (bitshift >= 0)
{
	bit = (num >> bitshift) & 1; // Get the current bit
	ft_send_bit(pid, bit, 1);    // Send the current bit
	--bitshift;                  // Move to the next bit
}

`ft_send_bit()`

void	ft_send_bit(pid_t pid, char bit, char pause_flag);

ft_send_bit() sends information to the server bit-by-bit.

It simply checks if the passed bit is 1 or 0 and sends the appropriate signal using kill().
If the call to kill() fails, the program writes an error message to stderr and exits.

if (bit == 0)
{
	if (kill(pid, SIGUSR1) < 0)
		ft_perror_exit("kill() failed sending SIGUSR1\n");
}
else if (bit == 1)
{
	if (kill(pid, SIGUSR2) < 0)
		ft_perror_exit("kill() failed sending SIGUSR2\n");
}

If the pause_flag is set to 1, the server waits for the next data chunk to be sent.

if (pause_flag != 0)
	pause();

Note

This function is called with pause_flag = 1 when used in the context of ft_send_char() and ft_send_int(), so that for each bit sent the client waits for a confirmation signal from the server before proceeding to send the data.

`ft_sigaction.c`

This file contains only a wrapper for sigaction used to set both the server's event handler and the client's event handlers for SIGUSR1 and SIGUSR2 signals:

Once again, error handling is done using control expressions inside if statements.

void	ft_set_sigaction(struct sigaction *sa)
{
	if (sigaction(SIGUSR1, sa, NULL) < 0)
		ft_perror_exit("sigaction() failed to handle SIGUSR1");
	if (sigaction(SIGUSR2, sa, NULL) < 0)
		ft_perror_exit("sigaction() failed to handle SIGUSR2");
}

Usage 🏁

To try and test minitalk:

First clone the repository:

git clone git@github.com:PedroZappa/42_minitalk.git

Then fetch the project's dependencies and compile the executables:

cd 42_minitalk
make

Get the server spinning:

./server

Now, on a different terminal, run the client:

./client [server-pid] [message]

The client will send the passed message to the target server with given pid.

Testing 🧪

If you're like me and use tmux you can quickly test the project using the following make rules:

Conveniently spin up a server on a new tmux window-split:

make serve

To automatically launch a few clients on new tmux window-splits:

make test

To run harder tests with longer messages including Unicode characters:

make stress_test

Appendix 📖

`Unicode` Character Encoding

Unicode, like other character encodings, functions as a lookup table mapping code points to characters.

The most important difference between Unicode and ASCII is that Unicode allows character encodings to be up to 32-bits wide, allowing for over 4 billion unique values (way too much space than we'll ever need to include every character set in existence).

Variable Length Encoding

Unicode takes a smart approach when it comes to character encoding. If a character can be represented by just 1 byte that's all the space that will be used. This memory efficient technique is known as variable length encoding.

For example a common character like a C takes 8 bits in memory, while special, rarer characters like 💩 need up to 32 bytes to be stored in memory.
This means a document like the present README takes about four times less space when encoded in UTF-8 than it would if encoded in UTF-32, making the page take less space in memory and load substantially faster.

Code Points

Unicode characters can be referenced by their code point.

A code point is a (irreducible) atomic unit of information.
A text document is a sequence of code points.
Each code point represents a number with a particular meaning in the Unicode standard.
The current Unicode standard defines 1,114,112 code points.
These code points are further divided into 17 planes or groundings.
Each plane is identified by a number from 0 to 16.
The number of code points in each plane is 65,536 ($2^{16}$).

To access a given code point we use the following syntax:

U+(hexadecimal representation of a code point)

Note

Hexadecimal values are used to represent the code points because they make it easier to reference large values.

Character	Code Point	Binary Representation
💩	U+1F4A9	0001 1111 0100 1010 1001
🌟	U+1F31F	0001 1111 0011 0001 1111

Grapheme Clusters

Some characters can be expressed as a combination of multiple code points known as grapheme clusters.

Character	Code Point
🧑	U+1F9D1
🌾	U+1F33E
🧑‍🌾	U+1F9D1 U+200D U+1F33E

Important

U+200D is a zero-width joiner.

License

This work is published under the terms of 42 Unlicense.

(get to top)

PedroZappa/42_minitalk

minitalk

Table o'Contents

About 📌

Mandatory Features

Bonus Features

Implementation 📜

t_protocol

server.c

Initializing sigaction

ft_server_sighandler()

Receiving Data

ft_strlen_received()

ft_print_msg()

Printing the Message

client.c

Initializing the Client's sigaction

ft_send_msg()

ft_send.c

ft_send_char() & ft_send_int()

Sending Data

ft_send_bit()

ft_sigaction.c

Usage 🏁

Testing 🧪

Appendix 📖

Unicode Character Encoding

Variable Length Encoding

Code Points

Grapheme Clusters

License

`t_protocol`

`server.c`

Initializing `sigaction`

`ft_server_sighandler()`

`ft_strlen_received()`

`ft_print_msg()`

`client.c`

Initializing the Client's `sigaction`

`ft_send_msg()`

`ft_send.c`

`ft_send_char()` & `ft_send_int()`

`ft_send_bit()`

`ft_sigaction.c`

`Unicode` Character Encoding