/RHive

RHive - rhive@nexr.com

Primary LanguageJava

NexR RHive 0.0-6
================

	RHive is an R extension facilitating distributed computing via HIVE query.
	It allows easy usage of HQL in R, and allows easy usage of R objects and R functions in Hive.


Installation Guide
==================
1. Install Hadoop
	1-1. Single Node (http://hadoop.apache.org/common/docs/r0.20.203.0/single_node_setup.html)
	1-2. Cluster Node (http://hadoop.apache.org/common/docs/r0.20.203.0/cluster_setup.html)
	1-3. set HADOOP_HOME at local machine on which R runs
	
2. Install Hive
	2-1. install local machine and remote machine on which NameNode runs or Hive-Server runs.
	2-2. Installation Guide 
	     (https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration)
	2-3. set HIVE_HOME at local machine on which R runs.
	2-4. launch Hive Server with following command on remote machine.
		 e.q. >$HIVE_HOME/bin/hive --service hiveserver
		
3. Install R and Packages
	3-1. install R
		3-1-1. need to install R on all tasktracker nodes
	3-2. install rJava
		3-2-1. only install rJava on local machine.
	3-3-A. Rserve mode - install Rserve
		3-3-A-1. need to install Rserve on all tasktracker nodes
		3-3-A-2. set RHIVE_DATA as R objects and R functions repository on all tasktracker nodes.
			   e.q. >export RHIVE_DATA=/rhive/data
	    3-3-A-3. make configuration in path (/etc/Rserv.conf) on all tasktracker nodes.
	           edit this file to add 'remote enable' to allow remote connection.
	    3-3-A-4. launch all Rserve on all tasktracker nodes.
	    	   e.q. >R CMD Rserve
    3-3-B. No Rserve mode - setting tasktracker nodes (Optional)
		3-3-B-1. set RHIVE_DATA as R objects and R functions repository on all tasktracker nodes.
			   e.q. >export RHIVE_DATA=/rhive/data
		3-3-B-2.
			   add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
			   e.q. >export R_HOME=/usr/lib/R
	3-4. install RUnit
	
4. Install RHive
	4-1. requirement software : ANT.
	4-2. R CMD INSTALL RHive_0.0-6.tar.gz
	4-3. If HADOOP_HOME doesn't exist, do following instruction :
		 4-3-1. copy RUDF/RUDAF library(rhive_udf.jar) to '/rhive/lib/' of HDFS path, 
	     		using this command : 'hadoop fs -put rhive_udf.jar /rhive/lib/rhive_udf.jar'. 
		 		this jar file exists under $HIVE_HOME/lib. 

5. Launch RHive
	5-1. launch R
	5-2. >library(RHive)
	5-3. >rhive.connect(hive-server-ip)
	
6. Tutorial
    https://github.com/nexr/RHive/wiki/UserGuides


Requirements
============

- Java 1.6
- R 2.13.0
- Rserve 0.6-0
- rJava 0.9-0
- Hadoop 0.20.x (x >= 1)
- Hive 0.8.x (x >= 0)