Disclaimer:
This repository is a fork detached of the following project: https://github.com/nexr/RHive
Due to the inactivity of the projects or ignoring my comments/fixes I decided to separate the development in a separate Github project, why not forked? Because fork apparently do not appear in searches and if anyone has problems with outdated versions will make it more difficult to find.
NexR RHive 2.0
RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.
Before installing RHive, you have to have installed Hadoop and Hive
Install Hadoop
- Single Node
- Cluster Node
- set HADOOP_HOME at local machine on which R runs
Install Hive
- install local machine and remote machine on which NameNode runs or Hive-Server runs.
- Installation Guide
- set HIVE_HOME at local machine on which R runs.
- launch Hive Server with following command on remote machine. it should be as a background process.
$HIVE_HOME/bin/hive --service hiveserver
Install R and Packages
- install R
- need to install R on all tasktracker nodes
- install rJava
- only install rJava on local machine.
- install Rserve
- need to install Rserve on all tasktracker nodes
- make configuration in path (/etc/Rserv.conf) on all tasktracker nodes. edit this file to add 'remote enable' to allow remote connection.
- launch all Rserve on all tasktracker nodes.
- e.q>
R CMD Rserve
- e.q>
- setting tasktracker nodes
- add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
- e.q>
export R_HOME=/usr/lib/R
- e.q>
- add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
- install RUnit
Install RHive
- Requirements
- ant (in order to build java files)
- Installing RHive
- Download source code:
git clone https://github.com/Worvast/RHive.git
- Change your working directory:
cd RHive
- Change active branch to 'ranger':
git fetch
git branch -v -a
git checkout -b ranger remotes/origin/ranger
- Set the environment variables:
export HIVE_HOME=/path/to/your/hive/directory
export HADOOP_HOME=/path/to/your/hadoop/directory
export HADOOP_HDFS_HOME=/path/to/your/hadoop/directory
export HADOOP_MAPREDUCE_HOME=/path/to/your/hadoop/directory
- Build java files using ant:
ant build
- Build RHive:
R CMD build RHive
- Install RHive:
R CMD INSTALL RHive_.tar.gz
- Download source code:
Loading RHive and connecting to Hive
- Set the environment variables HIVE_HOME and HADOOP_HOME:
- Set the environment variables:
export HIVE_HOME=/path/to/your/hive/directory
export HADOOP_HOME=/path/to/your/hadoop/directory
export HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
- Or, add environment variables into Renviron
HIVE_HOME=/path/to/your/hive/directory
HADOOP_HOME=/path/to/your/hadoop/directory
HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory
- Set the environment variables:
- launch R
library(RHive)
rhive.connect(host, port, hiveServer2)
Tutorials
Requirements
- Java 1.6
- R 2.13.0
- Rserve 0.6-0
- rJava 0.9-0
- Hadoop 0.20.x (x >= 1)
- Hive 0.8.x (x >= 0)