Big Data Tools and Examples
MapReduce, HDFS, Hadoop, Hive, Pig, Spark, HBase, Sqoop, Oozie
Compile and Run Java MapReduce Programs on Virtual Box
- Startup your VM
2. Write your driver source code using a text editor like vi (or emacs):
- Write your mapper and reducer source code:
vi MaxTemperatureMapper.java
vi MaxTemperatureReducer.java
- Compile your Java code:
java -version
yarn classpath
javac -classpath `yarn classpath` -d . MaxTemperatureMapper.java
javac -classpath `yarn classpath` -d . MaxTemperatureReducer.java
javac -classpath `yarn classpath`:. -d . MaxTemperature.java
- Create your jar file
jar -cvf maxTemp.jar *.class
- Create your input data file on the local file system
- Put your input data file into HDFS
hdfs dfs -ls /
hdfs dfs -ls /user
hdfs dfs -ls /user/cloudera
hdfs dfs -mkdir /user/cloudera/class1
hdfs dfs -put temperatureInputs.txt /user/cloudera/class1
hdfs dfs -cat /user/cloudera/class1/temperatureInputs.txt
- Run your MapReduce program
hadoop jar maxTemp.jar MaxTemperature /user/cloudera/class1/temperatureInputs.txt /user/cloudera/class1/output
- Verify that the program ran and the results are correct
hdfs dfs -ls /user/cloudera/class1/output
hdfs dfs -cat /user/cloudera/class1/output/part-r-00000