Hadoop-Unit is a project which allow testing projects which need hadoop ecosysteme like kafka, solr, hdfs, hive, hbase, ...
Moreover, it provide a standalone component which can be run locally and which simulate a hadoop cluster.
#Build
For windows users, you need to download a hadoop distribution, to unzip it and to define the system environment variable HADOOP_HOME
. You can also define the path into files default.properties
(warning: there are a lot...).
To build, launch the command:
mvn package
#Usage
When Hadoop Unit is started, it should display stuff like that:
______ __ _________ _____ __ __________
___ / / /_____ ______ /___________________ __ / / /_________(_)_ /_ 1.3
__ /_/ /_ __ `/ __ /_ __ \ __ \__ __ \ _ / / /__ __ \_ /_ __/
_ __ / / /_/ // /_/ / / /_/ / /_/ /_ /_/ / / /_/ / _ / / / / / /_
/_/ /_/ \__,_/ \__,_/ \____/\____/_ .___/ \____/ /_/ /_//_/ \__/
/_/
- ZOOKEEPER [host:127.0.0.1, port:22010]
- HDFS [port:20112]
- HIVEMETA [port:20102]
- HIVESERVER2 [port:20103]
- KAFKA [host:127.0.0.1, port:20111]
- HBASE [port:25111]
- SOLRCLOUD [zh:127.0.0.1:22010, port:8983, collection:collection1]
The available components are:
- HDFS
- ZOOKEEPER
- HIVEMETA
- HIVESERVER2
- SOLR
- SOLRCLOUD
- OOZIE
- KAFKA
- HBASE
- MONGODB
- CASSANDRA
- ELASTICSEARCH
However, for compatibility reason, SolR/SolRCloud and Elasticsearch can not be run into the same JVM. For this purpose, there are 2 standalone packages which are generated (one which is compliant with solr and one which is compliant with elasticsearch).
##Integration testing (will start each component present into classpath) With maven, add dependencies of components which are needed
Sample:
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-hdfs</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
In test do:
@BeforeClass
public static void setup() {
HadoopBootstrap.INSTANCE.startAll();
}
@AfterClass
public static void tearDown() {
HadoopBootstrap.INSTANCE.stopAll();
}
##Integration testing v2 (with specific component) With maven, add dependencies of components which are needed
Sample:
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-hdfs</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
In test do:
@BeforeClass
public static void setup() throws NotFoundServiceException {
HadoopBootstrap.INSTANCE
.start(Component.ZOOKEEPER)
.start(Component.HDFS)
.start(Component.HIVEMETA)
.start(Component.HIVESERVER2)
.startAll();
}
@AfterClass
public static void tearDown() throws NotFoundServiceException {
HadoopBootstrap.INSTANCE
.stopAll();
}
##Standalone mode As said above, SolR/SolRCloud and Elasticsearch are not compatible.
For this purpose, two packages are availables:
- hadoop-unit-standalone-solr
- hadoop-unit-standalone-elasticsearch
Unzip hadoop-unit-standalone-<type>-<version>.tar.gz
Change conf/default.properties
Change conf/hadoop.properties
Start in fg with:
./bin/hadoop-unit-standalone-<type> console
Start in bg with:
./bin/hadoop-unit-standalone-<type> start
Stop with:
./bin/hadoop-unit-standalone-<type> stop
##Shell Usage Hadoop-unit can be used with common tools such as:
- hbase shell
- kafka-console command
- hdfs command
- hive shell
###Kafka-console command
- Download and unzip kafka
- From directory
KAFKA_HOME/bin
(orKAFKA_HOME/bin/windows
for windows), execute command:
kafka-console-consumer --zookeeper localhost:22010 --topic topic
###HBase Shell
- Download and unzip HBase
- set variable
HBASE_HOME
- edit file
HBASE_HOME/conf/hbase-site.xml
:
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>127.0.0.1:22010</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase-unsecure</value>
</property>
</configuration>
- From directory
HBASE_HOME/bin
, execute command:
hbase shell
###HDFS command
- From directory
HADOOP_HOME/bin
, execute command:
hdfs dfs -ls hdfs://localhost:20112/
###Hive Shell
- Download and unzip Hive
- edit file
HIVE_HOME/conf/hive-site.xml
:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:20102</value>
</property>
</configuration>
- From directory
HIVE_HOME/bin
, execute command:
hive
#Sample See hadoop-unit-standalone/src/test/java/fr/jetoile/hadoopunit/integrationtest
#Maven Plugin usage A maven plugin is provided for integration test only.
##Embedded mode
To use it, add into the pom project stuff like that:
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-hdfs</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-hive</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-client-hdfs</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-client-hive</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>fr.jetoile.hadoop</groupId>
<artifactId>hadoop-unit-client-spark</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/*IntegrationTest.java</exclude>
</excludes>
</configuration>
<executions>
<execution>
<id>integration-test</id>
<goals>
<goal>test</goal>
</goals>
<phase>integration-test</phase>
<configuration>
<excludes>
<exclude>none</exclude>
</excludes>
<includes>
<include>**/*IntegrationTest.java</include>
</includes>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>hadoop-unit-maven-plugin</artifactId>
<groupId>fr.jetoile.hadoop</groupId>
<version>1.3</version>
<executions>
<execution>
<id>start</id>
<goals>
<goal>embedded-start</goal>
</goals>
<phase>pre-integration-test</phase>
</execution>
</executions>
<configuration>
<values>
<value>HDFS</value>
<value>ZOOKEEPER</value>
<value>HIVEMETA</value>
<value>HIVESERVER2</value>
</values>
</configuration>
</plugin>
</plugins>
</build>
Values can be:
- HDFS
- ZOOKEEPER
- HIVEMETA
- HIVESERVER2
- SOLR
- SOLRCLOUD
- OOZIE
- KAFKA
- HBASE
- MONGODB
- CASSANDRA
- ELASTICSEARCH
Here is a sample integration test:
public class HdfsBootstrapIntegrationTest {
static private Configuration configuration;
@BeforeClass
public static void setup() throws BootstrapException {
try {
configuration = new PropertiesConfiguration("default.properties");
} catch (ConfigurationException e) {
throw new BootstrapException("bad config", e);
}
}
@Test
public void hdfsShouldStart() throws Exception {
FileSystem hdfsFsHandle = HdfsUtils.INSTANCE.getFileSystem();
FSDataOutputStream writer = hdfsFsHandle.create(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
writer.writeUTF(configuration.getString(Config.HDFS_TEST_STRING_KEY));
writer.close();
// Read the file and compare to test string
FSDataInputStream reader = hdfsFsHandle.open(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
assertEquals(reader.readUTF(), configuration.getString(Config.HDFS_TEST_STRING_KEY));
reader.close();
hdfsFsHandle.close();
URL url = new URL(
String.format( "http://localhost:%s/webhdfs/v1?op=GETHOMEDIRECTORY&user.name=guest",
configuration.getInt( Config.HDFS_NAMENODE_HTTP_PORT_KEY ) ) );
URLConnection connection = url.openConnection();
connection.setRequestProperty( "Accept-Charset", "UTF-8" );
BufferedReader response = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );
String line = response.readLine();
response.close();
assertThat("{\"Path\":\"/user/guest\"}").isEqualTo(line);
}
}
##Remote mode This plugin start/stop a remote local hadoop-unit-standalone.
To use it, add into the pom project stuff like that:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<excludes>
<exclude>**/*IntegrationTest.java</exclude>
</excludes>
</configuration>
<executions>
<execution>
<id>integration-test</id>
<goals>
<goal>test</goal>
</goals>
<phase>integration-test</phase>
<configuration>
<excludes>
<exclude>none</exclude>
</excludes>
<includes>
<include>**/*IntegrationTest.java</include>
</includes>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>hadoop-unit-maven-plugin</artifactId>
<groupId>fr.jetoile.hadoop</groupId>
<version>1.3</version>
<executions>
<execution>
<id>start</id>
<goals>
<goal>start</goal>
</goals>
<phase>pre-integration-test</phase>
</execution>
</executions>
<configuration>
<hadoopUnitPath>/home/khanh/tools/hadoop-unit-standalone</hadoopUnitPath>
<exec>./hadoop-unit-standalone</exec>
<values>
<value>ZOOKEEPER</value>
<value>HDFS</value>
<value>HIVEMETA</value>
<value>HIVESERVER2</value>
</values>
<outputFile>/tmp/toto.txt</outputFile>
</configuration>
</plugin>
<plugin>
<artifactId>hadoop-unit-maven-plugin</artifactId>
<groupId>fr.jetoile.hadoop</groupId>
<version>1.3</version>
<executions>
<execution>
<id>stop</id>
<goals>
<goal>stop</goal>
</goals>
<phase>post-integration-test</phase>
</execution>
</executions>
<configuration>
<hadoopUnitPath>/home/khanh/tools/hadoop-unit-standalone</hadoopUnitPath>
<exec>./hadoop-unit-standalone</exec>
<outputFile>/tmp/toto.txt</outputFile>
</configuration>
</plugin>
Values can be:
- HDFS
- ZOOKEEPER
- HIVEMETA
- HIVESERVER2
- SOLR
- SOLRCLOUD
- OOZIE
- KAFKA
- HBASE
- MONGODB
- CASSANDRA
- ELASTICSEARCH
hadoopUnitPath is not mandatory but system enviroment variable HADOOP_UNIT_HOME must be defined.
exec variable is optional.
If both are set, HADOOP_UNIT_HOME override hadoopUnitPath.
Warning: This plugin will modify hadoop.properties and delete hadoop unit logs.
Here is a sample integration test:
public class HdfsBootstrapIntegrationTest {
static private Configuration configuration;
@BeforeClass
public static void setup() throws BootstrapException {
try {
configuration = new PropertiesConfiguration("default.properties");
} catch (ConfigurationException e) {
throw new BootstrapException("bad config", e);
}
}
@Test
public void hdfsShouldStart() throws Exception {
FileSystem hdfsFsHandle = HdfsUtils.INSTANCE.getFileSystem();
FSDataOutputStream writer = hdfsFsHandle.create(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
writer.writeUTF(configuration.getString(Config.HDFS_TEST_STRING_KEY));
writer.close();
// Read the file and compare to test string
FSDataInputStream reader = hdfsFsHandle.open(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
assertEquals(reader.readUTF(), configuration.getString(Config.HDFS_TEST_STRING_KEY));
reader.close();
hdfsFsHandle.close();
URL url = new URL(
String.format( "http://localhost:%s/webhdfs/v1?op=GETHOMEDIRECTORY&user.name=guest",
configuration.getInt( Config.HDFS_NAMENODE_HTTP_PORT_KEY ) ) );
URLConnection connection = url.openConnection();
connection.setRequestProperty( "Accept-Charset", "UTF-8" );
BufferedReader response = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );
String line = response.readLine();
response.close();
assertThat("{\"Path\":\"/user/guest\"}").isEqualTo(line);
}
}
#Component available
- SolrCloud 5.4.1
- Kafka
- Hive (metastore and server2)
- Hdfs
- Zookeeper
- Oozie (WIP)
- HBase
- MongoDB
- Cassandra 3.4
- ElasticSearch 5.0-alpha2
Built on:
- hadoop-mini-cluster-0.1.6 (aka. HDP 2.4.0)
- cassandra-unit-3.0.0.1
Use:
- download and unzip hadoop
- [for oozie only] download and unzip oozie (http://s3.amazonaws.com/public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.4.0/tars/oozie-4.2.0.2.3.4.0-3485-distro.tar.gz)
- edit default.properties and indicate HADOOP_HOME or set your HADOOP_HOME environment variable
- edit default.properties and indicate oozie.sharelib.path
Todo:
- male client utils for kafka produce/consume
- make sample with spark streaming and kafka
Issues:
- oozie does not work on windows 7 (see http://stackoverflow.com/questions/25790319/getting-access-denied-error-while-running-hadoop-2-3-mapreduce-jobs-in-windows-7)
- integrate phoenix
- can only manage one solr collection
- better docs ;)
This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.