/masuria

distributed framework for processing highly relational data

Primary LanguageJava

Project description

Masuria is a system for distributed processing of relational data, ie. data naturally described as graphs. It has following characteristics/goals:

  • scalable and fault tolerant;
  • easy to use with simple programming model;
  • extensible APIs, modular and pluggable implementation based on OSGi technology;
  • suitable for adaptation to a number of existing or future storage technologies.

For introduction to the computational paradigm upon which the framework is based (bulk synchronous parallel) see this presentation .

Sample Code

The following is a job definition that computes Single Source Shortest Paths from the starting vertex in a graph to all other vertices in that graph. The algorithm is taken from the Google Pregel paper3.

public class SSSPJob extends JobBase<Neo4JElement, DistanceMessage>
                         implements IJob<Neo4JElement, DistanceMessage> {

  public SSSPJob(final IElementId id, final IProgram program,
                 final JobExecutionContext executionContext) {

    super(id, program, executionContext);
  }

  @Override
  public void superStep() {

    int minDist = this.isStartingElement() ? 0 : Integer.MAX_VALUE;
    
    for(DistanceMessage msg: messages()) { // Choose min. proposed distance 

      minDist = Math.min(minDist, msg.getDistance() );

    }
    
    IVertex v = this.getElement();
            
    if( minDist < this.getCurrentDistance() ) { //If minDist improves current path, store and propagate
      
      this.setCurrentDistance(minDist);
      
      for(IEdge r: v.getOutgoingEdges(DemoRelationshipTypes.KNOWS) )  {
        
        IElement recipient = r.getOtherElement(v);
        int rDist = this.getLengthOf(r);
        this.sendMessage( new DistanceMessage(minDist+rDist, recipient.getId(), this.getId()) );        
      }
      
    }
    
  }

Getting Started

While the targeted execution environment for the framework is an OSGi container, simple examples can be run locally via executable class MasuriaDemoDriver5.

Development status

This is a prototype created mainly to identify main components and develop adequate intuitions around distributed OSGi development. The system can be run in a single-machine, single threaded mode.

TODO

  • OSGIfying build and components
    • provide master and peer applications
    • bundle-based program/job management
    • osgi-enabled configuration management and logging
  • capability to run in a distributed environment
  • add aggregate/reduce framework
  • add combiner mechanism

Related Work

  • Apache Hama2
  • Google Pregel3
  • CMU GraphLab4
  • Zurich University Signal/Collect6
  • Microsoft Research Trinity7
    …and a number of other projects.

References

1 ‘A bridging model for parallel computation’, Leslie G. Valiant

2 Apache Hama

3 Google Pregel

4 GraphLab

5 MasuriaDemoDriver.java

6 Signal/collect

7 Microsoft Research Trinity