Masuria is a system for distributed processing of relational data, ie. data naturally described as graphs. It has following characteristics/goals:
- scalable and fault tolerant;
- easy to use with simple programming model;
- extensible APIs, modular and pluggable implementation based on OSGi technology;
- suitable for adaptation to a number of existing or future storage technologies.
For introduction to the computational paradigm upon which the framework is based (bulk synchronous parallel) see this presentation .
The following is a job definition that computes Single Source Shortest Paths from the starting vertex in a graph to all other vertices in that graph. The algorithm is taken from the Google Pregel paper3.
While the targeted execution environment for the framework is an OSGi container, simple examples can be run locally via executable class MasuriaDemoDriver5. This is a prototype created mainly to identify main components and develop adequate intuitions around distributed OSGi development. The system can be run in a single-machine, single threaded mode.public class SSSPJob extends JobBase<Neo4JElement, DistanceMessage> implements IJob<Neo4JElement, DistanceMessage> {
public SSSPJob(final IElementId id, final IProgram program, final JobExecutionContext executionContext) {
super(id, program, executionContext); }
@Override public void superStep() {
int minDist = this.isStartingElement() ? 0 : Integer.MAX_VALUE;
for(DistanceMessage msg: messages()) { // Choose min. proposed distance
minDist = Math.min(minDist, msg.getDistance() );
}
IVertex v = this.getElement();
if( minDist < this.getCurrentDistance() ) { //If minDist improves current path, store and propagate
this.setCurrentDistance(minDist);
for(IEdge r: v.getOutgoingEdges(DemoRelationshipTypes.KNOWS) ) {
IElement recipient = r.getOtherElement(v); int rDist = this.getLengthOf(r); this.sendMessage( new DistanceMessage(minDist+rDist, recipient.getId(), this.getId()) ); }
}
}
- OSGIfying build and components
- provide master and peer applications
- bundle-based program/job management
- osgi-enabled configuration management and logging
- capability to run in a distributed environment
- add aggregate/reduce framework
- add combiner mechanism
- Apache Hama2
- Google Pregel3
- CMU GraphLab4
- Zurich University Signal/Collect6
- Microsoft Research Trinity7
…and a number of other projects.
1 ‘A bridging model for parallel computation’, Leslie G. Valiant