/swarm

Library for parallel computations with map programming paradigm. It may run on top of MPI (for cluster computation) or threads (for single machine).

Primary LanguageKotlinMIT LicenseMIT

Maven Central

Swarm

Library written in pure Kotlin to use for parallel computations with map programming paradigm. It may run on top of MPI (for multi-computer computation) or threads (for single machine).

Installation

Include into your Gradle build script the following line:

repositories {
    mavenCentral()
}

dependencies {
    implementation group: 'com.github.inforion', name: 'swarm', version: '0.2.0'
}

MPI/OpenMPI

That's it you are ready to go (for threads). If you want real cluster computation you should also install underlying MPI library. Swarm uses built bindings for OpenMPI library. To be able use of Swarm on top of MPI you should compile OpenMPI with Java bindings support. There is nothing complicated just clone OpenMPI sources and build it with --enable-mpi-java options. The full enough guide is in OpenMPI FAQ: https://www.open-mpi.org/faq/?category=java

When you assemble your program that use mpiSwarm you have to provide in classpath location of mpi.jar. This file built during OpenMPI compilation with Java support. Also, underlying low-level OpenMPI libraries must be in your LD_LIBRARY_PATH or seen for Java native loader.

NOTE: although compilation isn't hard and Swarm hides direct work with MPI but cluster computation is not for faint of heart.

Parallel concept of Swarm

In cluster programming (and for threads-case that simulates cluster) often used the next terms: master and slave nodes.

  • master node - is the main node that get control first of all
  • slave node - is node that wait for task (from master map command) and then do a job

In Swarm, we always have one master node and as many slaves nodes as specified during Swarm starts.

Also in a whole program we have three main parts:

  • common code - executed for all nodes (master and slaves)
  • master code - executed only for master node
  • slave code - executed only for slaves node

Usage

First we should create a 'split point' or 'entry point' for Swarm. It can be done using one of the function: threadsSwarm or mpiSwarm, depends on what underlying distibuted computation layer we want to use, e.g.:

// this code executes for all nodes: master and slaves

threadsSwarm(4) {  // here we specify 4 slaves nodes
    // now we are part of the swarm
    // this code executes only for master node
}

After we become a part of swarm we have two main type of possible actions: interaction with other nodes and map-computation itself. If in computation task no pre-computation is needed, i.e. no context for slaves required, then map function can be used.

threadsSwarm(size) { swarm ->
    // this code executes only for master node
    val result = listOf("ab", "cd", "dd")
        .parallelize(swarm)  // creates a special wrapper ParallelIterable for collection for parallization
        .map {  // API is simpilar for others map-like function
            // this code will be executed on slaves nodes in parallel
            it.toUpperCase()
        }
    
    // result = listOf("AB", "CD", "DD")
}

But in some cases we may need to setup context for all nodes before computation, then context may be used on Swarm object

// Context class, Heavy is some heavy to compute object and we should do it once before map
class Context(val node: Int, val heavy: Heavy)

threadsSwarm(size) { swarm ->
    swarm.context { 
        val heavy = initializeHeavy(...)
        Context(it, heavy)
    }
    
    swarm
        .get { context: Context -> context.heavy.someField }  // using get we could collect information from all slaves
        .forEach { println(it) }  // this executes only on master (simple forEach)

    val result = listOf("ab", "cd", "dd")
        .parallelize(swarm)
        .mapContext { context: Context, value ->  // use of mapContext to also get a previously set context among with value
            context.heavy.run(value)
        }
    
    // result = listOf context.heavy.run(...) on "ab", "cd", "dd"
}

Besides map and mapContext methods for ParallelIterable, a filter method available. Also you can find a number of usage example is in the SwarmTests.