Author Suman Lama
- A demo project to illustrate the use of Hive.
- This project uses yum.log file of the redHat Linux system. It adds the log details in Hive as a text file with entryDate and entryMessage as its format.
- Add the jar files in directory hive.
- Create database with
Create database databaseName
- Create table with
Create table tableName
. Use serde(Serialize/Deserialize) properties as follows. (SERDE is used to read data from elsewhere and write it in hdfs according to your requirement. In this case with a regular expression)- Use row formatter delimiter with following regex to separate:
(^[a-zA-Z]{3} \d{2} \d{2}:\d{2}:\d{2}) (.*)
- Save as textFile with:
"'output.format.string' = '%1$s %2$s'"
where %1$s and %2$s represents two groups from above regex
- Use row formatter delimiter with following regex to separate:
- Insert log file with:
LOAD DATA LOCAL INPATH {location of log} INTO TABLE HiveDemoTable;
- Run a query command :
Select distinct(entryDate) from hivedemotable;
- Add it in Eclipse and run Main to create DB and table then inserting data
- Then run ReadAll to read all entries for Aug 10
- Run ReadDistinct to read distinct dates of entry
- Run DropTable to drop the table.
[NOTE: Run main again after dropping table.]
Select *
uses fetch task equivalent tohadoop fs -cat $file_name
rather than map reduce so it is fast.Select distinct(entryDate)
uses map reduce task and is slower.