/DPHPC

The aim of this course is to parallelise an algorithm.

Primary LanguageC++

DPHPC

In this project we will implement a parallelised suffix array building algorithm for fast string search (FSS). Our practical intentions are fast genome indexing.

Inspiration is drawn from the following papers:

Flick, Patrick, and Srinivas Aluru. “Parallel Distributed Memory Construction of Suffix and Longest Common Prefix Arrays.” Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’15, ACM Press, 2015, pp. 1–10. DOI.org (Crossref), https://doi.org/10.1145/2807591.2807609.

Aziz M.M.A., Thulasiraman P., Mohammed N. (2020) Parallel Generalized Suffix Tree Construction for Genomic Data. In: Martín-Vide C., Vega-Rodríguez M., Wheeler T. (eds) Algorithms for Computational Biology. AlCoB 2020. Lecture Notes in Computer Science, vol 12099. Springer, Cham. https://doi.org/10.1007/978-3-030-42266-0_1

Tian, Y., Tata, S., Hankins, R.A. et al. Practical methods for constructing suffix trees. The VLDB Journal 14, 281–299 (2005). https://doi.org/10.1007/s00778-005-0154-8

Our project consists of the following steps:

  • Implement suffix array building algorithm
  • We will use MPI to coordinate across multiple working nodes

Compilation

The build environment with MPICH and google-benchmark can be built using the contained Dockerfile.

  • Compile:
mpicxx -O3 -pg  -std=c++11 main_mpi.cpp -o main
  • GPROF Anyalysis
gprof ./main  gmon.out
  • Euler
bsub -n 128 -R "rusage[mem=amount_of_mem_you_need (bytes)]" mpirun  ./main