APM parallelized for DNA analysis

DNA sequence comparison tool implementing Approximate Pattern Matching (APM) algorithm using levenshtein's distance.

Libraries used

This project combines OpenMP and MPI parallelization for a cluster of multicore machines

If you just want to compare performances between the sequential and parallel implementations, you might want to use our test script:

# Basic script that test 3 patterns on a one-line DNA file with approximation distance of 0
./test.sh

If you want to change the patterns, the approximation distance or the test file, just modify those lines in test.sh:

PATTERNS="CAG GTACAT GGG"   # List of patterns to test
FILE="dna/line_chrY.fa"     # DNA file
APPROXIMATION=0             # Approximation distance

Or if you want to execute one implementation, follow those steps :

# Generate the binaries
make

./apm approximation_factor dna_database pattern1 pattern2 ...

With OpenMP:

export OMP_NUM_THREAD=number_of_cores;
./apmOMP approximation_factor dna_database pattern1 pattern2 ...

With MPI:

mpirun -np number_of_machines -f hosts ./apmMPI approximation_factor dna_database pattern1 pattern2 ...

Hybrid (OpenMP + MPI) :

export OMP_NUM_THREAD=number_of_cores;
mpirun -np number_of_machines -f hosts ./apmParallel approximation_factor dna_database pattern1 pattern2 ...

./run.sh | tee run.data
gnuplot speedup.sh
display SpeedUpAPM-OMP.png

./mpi_speedup.sh | tee mpi_speedup.data
gnuplot mpi_speedup.sh
display SpeedUpAPM-MPI.png

Detailed benchmark and results analysis to be found at apm/results/Slides CSC5001.pdf