/Big-Data-Tools-And-Exampls

Hadoop and Spark and related tools: Hive, Pig, Spark, HBase, Sqoop, Oozie, etc

Primary LanguageJava

Big Data Tools and Examples

MapReduce, HDFS, Hadoop, Hive, Pig, Spark, HBase, Sqoop, Oozie

Using git push on CentOS

Compile and Run Java MapReduce Programs on Virtual Box

  1. Startup your VM 2. Write your driver source code using a text editor like vi (or emacs):
      vi MaxTemperature.java
  1.  Write your mapper and reducer source code:
      vi MaxTemperatureMapper.java
      vi MaxTemperatureReducer.java
  1. Compile your Java code:
java -version
yarn classpath
javac -classpath `yarn classpath` -d . MaxTemperatureMapper.java
javac -classpath `yarn classpath` -d . MaxTemperatureReducer.java
javac -classpath `yarn classpath`:. -d . MaxTemperature.java
  1. Create your jar file
jar -cvf maxTemp.jar *.class
  1. Create your input data file on the local file system
vi temperatureInputs.txt
  1. Put your input data file into HDFS
hdfs dfs -ls /
hdfs dfs -ls /user
hdfs dfs -ls /user/cloudera
hdfs dfs -mkdir /user/cloudera/class1
hdfs dfs -put temperatureInputs.txt /user/cloudera/class1
hdfs dfs -cat /user/cloudera/class1/temperatureInputs.txt
  1. Run your MapReduce program
hadoop jar maxTemp.jar MaxTemperature /user/cloudera/class1/temperatureInputs.txt /user/cloudera/class1/output
  1. Verify that the program ran and the results are correct
hdfs dfs -ls /user/cloudera/class1/output
hdfs dfs -cat /user/cloudera/class1/output/part-r-00000