/HiveEG

Primary LanguageJava

HiveEG

Author Suman Lama

Description:

  • A demo project to illustrate the use of Hive.
  • This project uses yum.log file of the redHat Linux system. It adds the log details in Hive as a text file with entryDate and entryMessage as its format.

Steps used:

  • Add the jar files in directory hive.
  • Create database with Create database databaseName
  • Create table with Create table tableName. Use serde(Serialize/Deserialize) properties as follows. (SERDE is used to read data from elsewhere and write it in hdfs according to your requirement. In this case with a regular expression)
    • Use row formatter delimiter with following regex to separate: (^[a-zA-Z]{3} \d{2} \d{2}:\d{2}:\d{2}) (.*)
    • Save as textFile with: "'output.format.string' = '%1$s %2$s'" where %1$s and %2$s represents two groups from above regex
  • Insert log file with: LOAD DATA LOCAL INPATH {location of log} INTO TABLE HiveDemoTable;
  • Run a query command : Select distinct(entryDate) from hivedemotable;

Steps to Run

- Add it in Eclipse and run Main to create DB and table then inserting data
- Then run ReadAll to read all entries for Aug 10
- Run ReadDistinct to read distinct dates of entry
- Run DropTable to drop the table.
[NOTE: Run main again after dropping table.]

Note

  • Select * uses fetch task equivalent to hadoop fs -cat $file_name rather than map reduce so it is fast.
  • Select distinct(entryDate) uses map reduce task and is slower.