/sparse-net

In this project I have implemented the forward function of a Neural Network composed of sparsely connected layers. In order to parallelize the forward function I have built two implementation: one uses OpenMP and the other uses CUDA.

Primary LanguageC++

sparse-net

In this project I have implemented the forward function of a Neural Network composed of sparsely connected layers. In order to parallelize the forward function I have built two implementation: one uses OpenMP and the other uses CUDA.

Fast Deployment

In order to check the proper functioning of the program you can execute the following steps:

OpenMP:

  • cd impl/openmp
  • bash compile.sh
  • ./nn_openmp -n 5000 -k 100

CUDA:

  • cd impl/cuda
  • bash compile.sh
  • ./nn_cuda -n 5000 -k 100

Project Structure

The project is divided in two folders: openmp and cuda (both inside the impl folder).

Each folder contains: a compile.sh script used to compile the relative implementation; test.sh to take measurements; clean_script.sh to clean the test scripts (more on this in the MEASUREMENTS section).

Both openmp and cuda have a src folder, each with their relative source files. Both the implementations share:

  • main(.cpp): collects the command line arguments and start the Neural Network's forward function.
  • nn(.cpp/.cu): Neural Network implementation.
  • sparselayer(.cpp/.cu): Layer implementation.
  • mat(.cpp): bidimensional wrapper for std::vector (used for the Layer's weights).
  • vector_utils(.cpp): utility functions for std::vectors (create random vectors, write vector to file, ecc...).

The CUDA implementation also has:

  • sparselayer_forward_kernel(.cu): contains the layer computation's kernel.

The output folder (one for each implementation) is used to contain the outputs of the forward function which are saved through the option "-so true" (more on this option in the section COMMAND LINE ARGUMENTS).

The folder test_results (one for each implementation) contains the .csv files generated by the script "test.sh".

Compilation

In order to compile the source files is sufficient to invoke the script called compile.sh (one in each folder) with the command bash compile.sh. The compilation will produce a file called nn_openmp for OpenMP and nn_cuda for CUDA.

Command Line Arguments

In order to execute the compiled program it should be specified the options -n (for the input size) and -k (for the number of layers).
e.g. ./nn_openmp -n 1000 -k 100.

In both files using the option -rd false allow to perform a deterministic computation (creating data with a linspace function). The option -so true allows to save the output in a file (it will be saved in the folder "impl/${implementation}/output/" where "${implementation}" can be "openmp" or "cuda").
e.g. ./nn_cuda -n 5000 -k 250 -rd false -so true would create a file named "output_N5000_K250_R3.txt.

The option -nf filename allows to load the Neural Network from the specified file. -if filename allows to load the input from the specified file. The easier way to create both data files is to use the Python script generate_data.py in the following way:
e.g. python generate_data.py -n 30 -k 5 -r 3 creates a Neural Network with N=30, 5 Layers and with parameter R=3; it also creates an input file with N=30.
The text files generated by "generate_data.py" will be saved in the folder "impl/_data/".
if both "-nf" and "-if" are specified there's no need to use the options "-n" and "-k".
if only "-nf" is specified then only the option "-n" needs to be used.
e.g. ./nn_openmp -nf ../_data/datafile_N30_K5_R3.txt -if ../_data/vector_N30.txt -so true would also save the ouput.
e.g. ./nn_openmp -n 30 -nf ../_data/datafile_N30_K5_R3.txt it's important that the value of -n matches with the N value reported on the file (30 in this case), otherwise an exception will be thrown.

In OpenMP using the option -pt PARALLELISM_TYPE would allow to use a different type of parallelism. In place of "PARALLELISM_TYPE" it can be inserted: "OUTER", "INNER" or "SEQUENTIAL" (the default is "OUTER").

In CUDA the option -sa false would allow to perform a parallel reduction (by default the reduction is sequential).

Measurements

The script test.sh (one in each folder) contains the measurements done in order to build the plots (it needs to be called with "bash test.sh"). "test.sh" will produce one or more .csv file (depending on the implementation) that will be saved in the folder "impl/${implementation}/test_results/measures/".

In order to execute "test.sh" it may be necessary to clean it throught the script clean_script.sh with the command "bash clean_script.sh test.sh" (this step may be required since the used editor uses as newline character "\r" instead of "\n").

To create the plots used in the report it has been used a python script called plot.py. There, all the metrics have been computed as well. The plots created with "plot.py" will be saved in the folder "impl/${implementation}/test_results/plots/".

Hardware Info

The informations about the hardware (CPU and GPU) used for this project can be found in the folder hardware_info.