/Hive-JDBC-Storage-Handler

Hive Storage Handler for JDBC

Primary LanguageJavaOtherNOASSERTION

#Hive Storage Handler for JDBC#

The Hive Storage Handler For JDBC by Qubole, which is a fork of HiveJdbcStorageHandler, helps users read from and write to JDBC databases using Hive, and also enabling them to run SQL queries to analyze data that resides in JDBC tables. Optimizations such as FilterPushDown have also been added.

##Building from Source##

  • Download the code from Github:
  $ git clone https://github.com/qubole/Hive-JDBC-storage-Handler.git
  $ cd Hive-JDBC-storage-Handler
  • Build using Maven (add -DskipTests to build without running tests):
  $ mvn clean install -Phadoop-1
  • The JARs for the storage handler can be found in the target/ folder. Use qubole-hive-JDBC-0.0.4.jar in the hive session (see below).

##Usage##

  • Add the JAR to the Hive session. <path-to-jar> is the path to the above mentioned JAR. For using this with Qubole hive, upload the JAR to an S3 bucket and provide its path.
  ADD JAR <path-to-jar>;
  • Each record in the JDBC corresponds to a row in the Hive table.

  • While creating the Hive table, use

  STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'

##Table Creation## Table can be created in 2 ways:

  • First is, Column mappings can be explicitly given along with the table creation statement.
DROP TABLE HiveTable;
CREATE EXTERNAL TABLE HiveTable(
  id INT,
  id_double DOUBLE,
  names STRING,
  test INT
)
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name"="JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);

  • Second is, no table mappings are specified, SerDe class automatically generates those mappings.
CREATE EXTERNAL TABLE HiveTable
row format serde 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcSerDe'
STORED BY 'org.apache.hadoop.hive.jdbc.storagehandler.JdbcStorageHandler'
TBLPROPERTIES (
  "mapred.jdbc.driver.class"="com.mysql.jdbc.Driver",
  "mapred.jdbc.url"="jdbc:mysql://localhost:3306/rstore",
  "mapred.jdbc.username"="root",
  "mapred.jdbc.input.table.name"="JDBCTable",
  "mapred.jdbc.output.table.name" = "JDBCTable",
  "mapred.jdbc.password"="",
  "mapred.jdbc.hive.lazy.split"= "false"
);
NOTE: "mapred.jdbc.hive.lazy.split"= "true" property enables 
       split computation to be done by mappers internally.

##Sample Queries##

HIVE-JDBC Storage Handeler supports alomost all types of possible SQL queries. Some examples of supported queries are:

####Queries to Read from DB ####

* select * from HiveTable;
* Select count(*) from HiveTable;
* select id from HiveTable where id > 50000;
* select names from HiveTable;
* select * from HiveTable where names like ‘D%’;
* SELECT * FROM HiveTable ORDER BY id DESC;

####Group By Queries ####

* select id, sum(id_double) as sum_double from HiveTable group by id;

Join Queries

* select HiveTable_1.*, HiveTable_2.* from HiveTable_1 a join HiveTable_2 b 
  on (a.id = b.id) where a.id > 90000 and b.id > 97000 ;

Queries to insert data into DB

* Insert Into Table HiveTable_1 select * from HiveTable_2;
*Insert Into Table HiveTable_1 select * from HiveTable_2 
 where id > 50000 and upper(names) = 'ANN';

Support For FilterPushDown

 set hive.optimize.ppd = false;

##Contributions##