/MapReduce

An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.

Primary LanguageGoMIT LicenseMIT

MapReduce

github build status Go Reference

This is an easy-to-use Map Reduce Go framework inspired by 2021 6.824 lab1.

mapReduce

Feature

  • Multiple workers goroutine in a program on a single machine.
  • Multiple workers process in separate program on a single machine.
  • Fault tolerance.
  • Easy to parallel your code with just Map and Reduce function.

Library Usage - Your own map and reduce function

Here's a simply example for word count program. wc.go

package main
import (
	"strconv"
	"strings"
	"unicode"

	"github.com/BWbwchen/MapReduce/worker"
)
func Map(filename string, contents string, ctx worker.MrContext) {
	// function to detect word separators.
	ff := func(r rune) bool { return !unicode.IsLetter(r) }

	// split contents into an array of words.
	words := strings.FieldsFunc(contents, ff)

	for _, w := range words {
		ctx.EmitIntermediate(w, "1")
	}
}
func Reduce(key string, values []string, ctx worker.MrContext) {
	// return the number of occurrences of this word.
	ctx.Emit(key, strconv.Itoa(len(values)))
}

Usage - 1 program with master, worker goroutine

main.go

package main

import (
	mp "github.com/BWbwchen/MapReduce"
)

func main() {
	mp.StartSingleMachineJob(mp.ParseArg())
}

Run with :

# Compile plugin
go build -race -buildmode=plugin -o wc.so wc.go

# Word count
go run -race main.go -i 'input/files' -p 'wc.so' -r 1 -w 8

Output file name is mr-out-0.txt

More example can be found in the mrapps/ folder, and we will add more example in the future.

Usage - Master program, and worker program (Isolate master and workers)

master.go

package main

import (
	mp "github.com/BWbwchen/MapReduce"
)

func main() {
	mp.StartMaster(mp.ParseArg())
}

worker.go

package main

import (
	mp "github.com/BWbwchen/MapReduce"
)

func main() {
	mp.StartWorker(mp.ParseArg())
}

Run with :

# Compile plugin
go build -race -buildmode=plugin -o wc.so wc.go

# Word count
go run -race cmd/master.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 8 &
sleep 1
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 1 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 2 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 3 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 4 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 5 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 6 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 7 &
go run -race cmd/worker.go -i 'txt/*' -p 'cmd/wc.so' -r 1 -w 8 

Help

MapReudce is an easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1.
It supports multiple workers threads on a single machine and multiple processes on a single machine right now.

Usage:
  mapreduce [flags]

Flags:
  -h, --help            help for mapreduce
  -m, --inRAM           Whether write the intermediate file in RAM (default true)
  -i, --input strings   Input files
  -p, --plugin string   Plugin .so file
      --port int        Port number (default 10000)
  -r, --reduce int      Number of Reducers (default 1)
  -w, --worker int      Number of Workers(for master node)
                        ID of worker(for worker node) (default 4)

Contributions

Pull requests are always welcome!

Made by Bo-Wei Chen. All code is licensed under the MIT License.