This repos is for MapReduce source code and how to run it on Docker based on the setup from https://github.com/qducnguyen/hadoop-docker with optimizations.
There are five major parts of this setup,
- assets : This folder contains binaries for Hadoop and Java. Please download JDK 8.0 binaries and hadoop 3.3.6 binaries and rename them to hadoop-3.3.6.tar.gz and jdk-8u202-linux-x64.tar.gz and put them under folder 'assets' for it to work properly.
- config-files : All configures for Hadoop ${HADOOP_HOME}/etc/hadoop/ are in here.
- gnome-kmer-counting: Mapper and Reducer for gnome kmer exercice using Hadoop Streaming from HW1.
- scripts: scripts for building, running and cleaning images, docker containers for this repo.
- mapred-src: scripts for HW2: MapReduce in Python Streaming and Java.
All assets are available in this folder. Please download the right files for the
assets
folder.
Please follow the order in scripts
folder.
Note for Java run:
- Install the latest java8 suitable for your OS.
- Follow the tutorial Java for VScode to using Maven Project at
mapred-src/maven
. - Remember to export JAR in VSCODE with the option "without main class".