/hadoop-testbench

Hadoop java programs

Primary LanguageJava

Hadoop Testbench

This repository is my testbench while learning Big Data using the book Hadoop: The Definitive Guide, Fourth Edition by Tom White (O'Reilly, 2014)

###Content

  1. WordCount contains a map reduce program to count number of times every word is repeated in the given input file. The input file can be any text file
  2. MaxTemperature contains a map reduce program to determine maximum temperature for every year in the input file. A sample input file can be downloaded from the hadoop book's repository. If you want to run your program on more data, refer to the book's website for downloading bigger input data file.
  3. Cat contains simple FileSystem operation for displaying an input file data on the screen. Example usage hadoop jar hadoop-testbench-1.0-SNAPSHOT.jar com.praveen.FileSystem.Cat /sample.txt. Here sample.txt is the location of file on the hdfs
  4. CatSeek is same as Cat but we can skip some number of characters of our choice to be displayed on the screen. Example usage hadoop jar hadoop-testbench-1.0-SNAPSHOT.jar com.praveen.FileSystem.CatSeek /sample.txt 10. Here sample.txt is the location of file on the hdfs and the last parameter 10 is the number of characters to skip in the beginning of the file
  5. FileCopy will copy a file from local file system to HDFS. A progressable is passed as lambda function to create method which will print a dot during the progress. Example usage hadoop jar hadoop-testbench-1.0-SNAPSHOT.jar com.praveen.FileSystem.FileCopy sample.txt /sample.txt. Here first arg is the location of the local file and second arg is the location of destination in hdfs.
  6. FileStatus uses FileSystem.getFileStatus method to print the information about a file and directory given as input arguments. Example usage hadoop jar hadoop-testbench-1.0-SNAPSHOT.jar com.praveen.FileSystem.FileStatus /india_weather.csv /testbench