This is the final project of System programming university course. Professor's directions say it should be a server-client system for remote files ciphering (and/or deciphering), following a notorious malware path known as ransomware. Once installed to the attacked machine, the client will be the rudder for the attack. The application needs to support both Linux and Windows systems, operating exactly the same way on both.
Name choice has not been imposed by the professor, and descends from two fundamental reasons:
-
the paronomasia with cryptolocker, one of the most famous ransomware attacks since 2013;
-
gives a playful idea if the project, developed for university-related cognitive purposes, marking the simplicity of the alphabetic encryption system used.
The project contains two main elements:
-
src(folder): contains the source code, both for Linux and Windows platforms; -
Makefile(refular file): contains all the rules needed to automate the build processes.
Physical files and folders structure has been defined that way it is to handle the distinction of every modulem, communication edge and host platform:
-
communication edge:
-
server -
client
-
-
host platform:
-
Linux
-
Windows
-
Communication edge one gets done with client and server folders (immediately after src one); common folder is used to contain code used by both edges.
Any of this containers - server and/or client - contain the latter distinction, via linux and windows folders and, if needed, common one, too (similarly to the one already mentioned, it's used for cross-platform code).
As per professor's directions, server is made of several parts:
-
a TCP socket needs to be instanciated to be listening on any network interface on conventional
8888port (otherwise specified by-pinput flag); -
the handling of any request gets delegated to a thread. The
n-threads (conventrionally 4, or specified by the-nflag) are handled by a threadpool (more details below); -
serveroperations are limited to the elements (recursively) contained into the folder mandatorily specified with-cflag. -
supported operations are the following:
-
LSTFLists the files contained in the folder, along with their bytes size;
-
LSTRRecursively lists the files contained in the folder, along with their bytes size;
-
ENCR seed fileCiphers, with a key generated using
seed, the content offileinto thefile_encand then removesfile; -
DECR seed file_encDeciphers, with a key generated using
seed, the content offile_encinto thefileand then removesfile_enc;
-
-
it's possible to specify a configuration files, with
-fflag, within which to indicate the values corresponding to-pand-nflags.
It implementation follows some simple steps, explained below.
First operation that gets done is the input arguments scan and their parsing. In order to do this, getopt library has been mainly used, because of its use simplicity in the flag-value association and in reporting facoltatives / mandatory input combinations. Later, parameters get validated, instead:
-
the existence of configuration file: if existing, its values get parsed;
-
the existence of the folder within which to relegate the application execution;
-
specified port number validity;
-
specified maximum threads number validity for threadpool.
Once user input has been validated and handled, application effectively configures itself:
-
threadpool
init()function gets called to initialize the maximum threads number (more details below); -
WSADATAstructure get instanciated: it's actually used for socket handling procedures (only on Windows platform); -
passive TCP socket gets instanciated on the specified port and fired to be listening for eventual connections.
Once an active socket is generated, its pointer gets passed as paramenter to the handle_connection() function, delegated of the server-client conversation management, for all its duration. The implementation of this method, within a for cycle, scans and reacts to every command requested by the other side of the communication:
-
LSTF/LSTR: these two commands use the same ricorsivelist(char *ret_out, int recursive)function (that respectively callslist_opt(char *ret_out, int recursive, char *folder, char *folder_suffix)function): while calling it,LSTFindirectly sets the boolean (an integer)recursiveto 0, whileLSTRdoes the same to 1. In both cases,*folderparameter will match with the*arg_folderpointer (the folder within which the application is in execution) and*folder_suffixwill be NULLed. So, while scanning the*folderfolder content, if another folder is met andrecursiveis true, then the function will call itself again, populating*folder_suffixvariable adequately, to indicate the suffix that needs to be added to the initial forlder to construct the path of the just met folder. -
ENCR/DECR: exploiting the peculiarity of the ciphering made using the exclusive disjunction (XOR) operator (given a k key and ciphered a characters sequence applying the XOR bitwise operation with k key, reapplying the same operation with the same k, the the same initial sequence will be obtained), the implementation of the two commmands has been unified. In fact, they execute the same procedure (described more in detail below), except for the configuration of the input/output file.
threadpool uses custom data structures to simplify both the complexity of the problems that it handles, and multi-platform. job_t is the most atomic structures, it contains all the informations about a task that needs to be executed within the threadpool: a function and its arguments pointers, and a pointer to the next job_t. Then, threadpool_t structure gathers useful informations for the correct functioning of the threadpool, such as the maximum threads number, or the threads list itself, or the mutex and the condition variable, both used to regulate the interactions with internal fields of the structure itself.
There're several ways to interace with the threadpool from the outside:
-
threadpool_init()This function initializes the data structures used by the module, and it's used to specify the maximum threads number usable in the threadpool context.
-
threadpool_add_job()This function is delegated to handle the new operations - that will be marked as in pending - adding procedure to the threadpool queue.
-
threadpool_bye()Finally, this function is invoked to make all the memory cleaning operations, before stopping the threadpool.
Every thread is configured to execute the module static function thread_boot, which do nothing but executing a task, or better, a job_t, actually in pending status on the threadpool queue. This gets done once it has acquired the lock on the mutex, so to update the informations about the next task and about the number of pending tasks.
cipher module follows a very simple structure, composed by only a cipher() function which gets multiple arguments: two char pointers - which correspond to the paths of the input/output files of the ciphering procedure - and an unsigned int used as seed for the key generation. The body of this function has three steps:
-
initialization of file descriptor (s) (on Linux platform) or
HANDLE(s) (on Windows platform) and of memory maps of the input/output files, once obtained the lock on the first one; -
effective ciphering of the first file to the latter;
-
closing of file descriptor (s) (on Linux platform) or
HANDLE(on Windows platform) and of the memory maps of the input/output files, once released the lock on the first one.
Regarding the parallelization problem, has been allowed to use the OpenMP API. This choice is motivated by two fundamental reasons:
-
it's a multi-platform API, so it won't need any code difference between the systems;
-
it's extremely simple to use.
As for the game rules imposed by the professor, the parallelization needs to be applied only if the file that will be ciphered is greater than 256 Kbyte. This is the reason why it has been chosen make the process work with two nested for cycles. The first one is parallelized using OpenMP, and will iterate on every 256 Kbyte long portion of the file. This way, if the file is less than 256 Kbyte, the for will include only a cycle, as it's executed sequentially. On the other side, the nested for cycle will iterate on every 4 byte, corresponding to an integer, that compose the 256 Kbyte. For any of these, ciphering will be calculated.
One of the encountered problems was about the needing of finding the simplest way to let the one-time pad ciphering method and the parallelization on files greater than 256 Kbyte coexist. In fact, although the partitioning of the file into 256 Kbytes long blocks has simplified the parallelization problem, the parallelization itself has generated a new problem, because of its missing systematic of execution. In a sequential scenario it's provably true that for every iteration element, starting from the same seed, always the same key will be generated. In a parallelized scenario, this is not provable, as there's no way to foresee the for cycle execution order. In order to solve this problem, a new additional memory map is used to preventively and sequentially generate the ciphering keys. In fact, before executing the two nested for cycles for the effective ciphering procedure, a new memory map (of the same size of the input file) is instanciated and populated - with a new for cycle - with the keys generated using rand_r() (or cipher_rand(), on Windows platform). On the old next for cycles, instead of dinamically generate the keys, every memory map key item corresponding to the iteration element will be used.
As easily deductible reading Windows module's code variant, there's a function which is not present on the same module's Linux implementation: cipher_rand(). It consists of a pseudo-random number generator, used to generate the key from the seed. The reason behind this choice is about the lack of a system implementation of such a function on Windows platform. So, in this case, the method - taken from the implementation of rand_r() offered by MinGW (more details below) - has been provided.
static int cipher_rand(unsigned int *seed) {
long k;
long s = (long)(*seed);
if (s == 0) {
s = 0x12345987;
}
k = s / 127773;
s = 16807 * (s - k * 127773) - 2836 * k;
if (s < 0) {
s += 2147483647;
}
(*seed) = (unsigned int)s;
return (int)(s & RAND_MAX);
}client is the simplest code portion of the project. It's made of three parts:
-
arguments parsing
Unlike the
serverimplementation, in this case no external auxiliary library has been used to handle the arguments parsing problem; instead, a more artisanal method that could fit around the case needs has been preferred. In fact, the ambiguity between the input flags that don't need arguments and the ones that actually do, between flags that indicate analogues commands, or that -serverside - need arguments, has brought to this choice. -
creation of socket, needed to connect to a
serverRegarding the socket management, it's the most reduntant code portion, if compared to the one from
servermodule; they only have a difference: theserversocket is waiting for connections, theclientone is requesting a connection to a previouslyserverallocated one, instead. -
back-and-forth with server
In this phase, what happens will follow this simple scheme:
-
translation of the
clientinput flags intoserversupported commands (e.g.,clientflag-lgets translated intoservercommandLSTF); -
sending command to
servervia socket; -
receiving a reply from
serverand printing the reply itself.
-
The configuration of the development environment and the subsequent compilation on Linux platform is relatively simple, as most of Linux distribution provides a base packages group for the development. In the case of the environment where the code has been written:
# eopkg it -c system.devel
Now, just a make is enough:
# make [server|client]
This software has been written and tested on the following environment:
| Linux | 4.9.45 x86_64 |
|---|---|
| Distribution | Solus Project |
| RAM | 20 Gb |
| CPU | Intel Core i7-4770k Haswell |
| Type | Phisical machine |
Configuration and compilation of project on a Windows environment is a little bit more complicated. Auxiliary libraries have been used to simplify the compilation phase, and reduce the differentiation of compilation template listed in the Makefile to the minimum: it's the reason why msys2 and mingw-w64 have been adopted. The configuration proceeds this way:
-
msys2installation via official site: http://www.msys2.org -
Application launching and subsequent update of packages database:
# pacman -Syu -
Effective packages update:
# pacman -Su -
minGW-w64installation:# pacman -S mingw-w64-x86_64-gcc(on 32 bit architecture, installmingw-w64-i686-gccinstead) -
Base development dependencies installation:
# pacman -S base-devel -
optional: in order to use the packages installed above even from the PowerShell, adding the path of the binary files to the global Windows
PATHvariable is required:C:\path\to\msys2\usr\bineC:\path\to\msys2\mingw64\bin.
This software has been written and tested on the following environment:
| Windows | 10 Pro x86_64 |
|---|---|
| RAM | 4096 Mb |
| CPU | Intel Core i7-4770k Haswell |
| Type | Virtual machine |
server can be used the following ways (assuming pwd is the root folder of the project, once it has been compiled):
# ./bin/cryptoloackerd -c /path [-n max-threads -p port]
You can specify the /path and the port nnumber in a configuration file and let the server load those values reading the configuration file itself, specifying it as parameter:
# ./bin/cryptoloackerd -f /path/file.txt [-n max-threads]
In this case, file will be populated following this template:
# cat /path/file.txt
folder = /path
port = 8888
client can be used the following ways (assuming pwd is the root folder of the project, once it has been compiled):
-
To execute
LSTForLSTR:# ./bin/cryptoloacker -h server-ip -p port [-l|-R] -
To execute
ENCRorDECR:# ./bin/cryptoloacker -h server-ip -p port [-e|-d] seed /path/file