DNA sequence comparison tool implementing Approximate Pattern Matching (APM) algorithm using levenshtein's distance.
This project combines OpenMP and MPI parallelization for a cluster of multicore machines
If you just want to compare performances between the sequential and parallel implementations, you might want to use our test script:
# Basic script that test 3 patterns on a one-line DNA file with approximation distance of 0
./test.sh
If you want to change the patterns, the approximation distance or the test file, just modify those lines in test.sh:
PATTERNS="CAG GTACAT GGG" # List of patterns to test
FILE="dna/line_chrY.fa" # DNA file
APPROXIMATION=0 # Approximation distance
Or if you want to execute one implementation, follow those steps :
# Generate the binaries
make
- Sequential execution :
./apm approximation_factor dna_database pattern1 pattern2 ...
- Parallel execution :
With OpenMP:
export OMP_NUM_THREAD=number_of_cores;
./apmOMP approximation_factor dna_database pattern1 pattern2 ...
With MPI:
mpirun -np number_of_machines -f hosts ./apmMPI approximation_factor dna_database pattern1 pattern2 ...
Hybrid (OpenMP + MPI) :
export OMP_NUM_THREAD=number_of_cores;
mpirun -np number_of_machines -f hosts ./apmParallel approximation_factor dna_database pattern1 pattern2 ...
- OpenMP Benchmark :
./run.sh | tee run.data
gnuplot speedup.sh
display SpeedUpAPM-OMP.png
- MPI Benchmark :
./mpi_speedup.sh | tee mpi_speedup.data
gnuplot mpi_speedup.sh
display SpeedUpAPM-MPI.png
Detailed benchmark and results analysis to be found at apm/results/Slides CSC5001.pdf