/Meta-Iterative-MapReduce

Meta-Iterative Map-Reduce to perform Regression massively parallely on a cluster with MPI and CUDA for GPU and CPU-nodes support.

Primary LanguageCudaMIT LicenseMIT

Meta-Iterative Map-Reduce

Implementation of the Meta-Iterative Map-Reduce algorithm to perform distributed & scalable training of a machine learning model on a GPU+CPU cluster using CUDA-aware-MPI.

Authors : Shikhar Srivastava & Jawahar Reddy

Download the Project Report here .

What is Meta-Iterative Map Reduce?

Let's explain this using a bottom-up approach:

  • Map-Reduce is a programming model to parallelize computations of large tasks by parallelly solving sub-tasks. The sub-tasks are mapped to multiple 'workers' that concurrently solve their parts and the outputs of the sub-tasks are reduced back to form a solution to the primary task.

  • For tasks that benefit from iterations of computation, such as Machine Learning model training, the Map-Reduce operations are performed iteratively until a required solution is obtained. This programming model is therefore termed Iterative Map Reduce.

  • Now imagine, instead of diving the task into 1 level of sub-tasks, we continue to divide the sub-tasks further, into their own sub-(sub)-tasks that themselves follow a Map-reduce paradigm. This would mean that each 'worker' that was originally computing a sub-task, is now itself delegating work to a secondary level of workers. Done iteratively, we termed this composite of Map-reduce operations as Meta Iterative MapReduce.

Advantages of Meta-Iterative MapReduce

  1. Effective speed-up of No. of Parallel MPI Processes ∗ No. of CUDA Kernel Threads

  2. Using the Meta model, we can effectively leverage CUDA-aware-MPI. Thus,

    • All operations that are required to carry out a message transfer i.e. a send operation can be pipelined.

    • Acceleration technologies like GPUDirect can be utilized by the MPI library transparently to the user.

  3. Iterative MapReduce has significant applications for massively parallel, complex computations that are iteratively performed, such as modern Deep Learning applications, wherein both strict data store and floating-point operations requirements exist.


Dependencies:

  • CUDA Toolkit (ver >= 7.0)

  • Microsoft MPI or OpenMPI (tested on Microsoft MPI ver 8.0)

  • Nvidia Graphics card [CUDA-supported GPU]

Installation:

  1. Clone the repository : git clone https://github.com/soilad/Meta-Iterative-MapReduce.git

  2. Ensure that the cuda.h header file is added to the compilation path in your IDE or mpicc compiler.

  3. Compile the kernel.cu file using the MPI compiler.

    • For Microsoft MPI,

      ```sh
      mpicc kernel.cu -o metamap
      ```
      
    • For Open MPI,

    Refer to https://www.open-mpi.org/faq/?category=runcuda & https://www.open-mpi.org/faq/?category=buildcuda

  4. Execute the compiled kernel code: $ ./metamap

Key terms:

  • CUDA-aware MPI: Accelerate MPI by leveraging GPU compute through CUDA. https://devblogs.nvidia.com/introduction-cuda-aware-mpi/
  • Iterative MapReduce : The Map-reduce paradigm was adapted for iterative operations, for example in Machine Learning model training. https://deeplearning4j.org/iterativereduce
  • Meta Iterative MapReduce : We (the authors) proposed a model that performs two "levels" of iterative map-reduce operations. The gist is that each map-operation in the first level of map-reduce is a composite of another level of map-reduce operation. < Efficiency bounds are better this way >
  • [Linear] Regression : To showcase the improvement in model training speed, we perform distributed training of a Linear regression using the Meta-Iterative Map-Reduce programming model.