hadoop-unit

Hadoop-Unit is a project which allow testing projects which need hadoop ecosysteme like kafka, solr, hdfs, hive, hbase, ...

Moreover, it provide a standalone component which can be run locally and which simulate a hadoop cluster.

#Build

For windows users, you need to download a hadoop distribution, to unzip it and to define the system environment variable HADOOP_HOME. You can also define the path into files default.properties (warning: there are a lot...).

To build, launch the command:

mvn package

#Usage

When Hadoop Unit is started, it should display stuff like that:

           ______  __      _________                         _____  __      __________
           ___  / / /_____ ______  /___________________      __  / / /_________(_)_  /_ 1.3
           __  /_/ /_  __ `/  __  /_  __ \  __ \__  __ \     _  / / /__  __ \_  /_  __/
           _  __  / / /_/ // /_/ / / /_/ / /_/ /_  /_/ /     / /_/ / _  / / /  / / /_
           /_/ /_/  \__,_/ \__,_/  \____/\____/_  .___/      \____/  /_/ /_//_/  \__/
                                               /_/
 - ZOOKEEPER [host:127.0.0.1, port:22010]
 - HDFS [port:20112]
 - HIVEMETA [port:20102]
 - HIVESERVER2 [port:20103]
 - KAFKA [host:127.0.0.1, port:20111]
 - HBASE [port:25111]
 - SOLRCLOUD [zh:127.0.0.1:22010, port:8983, collection:collection1]

The available components are:

  • HDFS
  • ZOOKEEPER
  • HIVEMETA
  • HIVESERVER2
  • SOLR
  • SOLRCLOUD
  • OOZIE
  • KAFKA
  • HBASE
  • MONGODB
  • CASSANDRA
  • ELASTICSEARCH

However, for compatibility reason, SolR/SolRCloud and Elasticsearch can not be run into the same JVM. For this purpose, there are 2 standalone packages which are generated (one which is compliant with solr and one which is compliant with elasticsearch).

##Integration testing (will start each component present into classpath) With maven, add dependencies of components which are needed

Sample:

<dependency>
    <groupId>fr.jetoile.hadoop</groupId>
    <artifactId>hadoop-unit-hdfs</artifactId>
    <version>1.3</version>
    <scope>test</scope>
</dependency>

In test do:

@BeforeClass
public static void setup() {
    HadoopBootstrap.INSTANCE.startAll();
}

@AfterClass
public static void tearDown() {
    HadoopBootstrap.INSTANCE.stopAll();
}

##Integration testing v2 (with specific component) With maven, add dependencies of components which are needed

Sample:

<dependency>
    <groupId>fr.jetoile.hadoop</groupId>
    <artifactId>hadoop-unit-hdfs</artifactId>
    <version>1.3</version>
    <scope>test</scope>
</dependency>

In test do:

@BeforeClass
public static void setup() throws NotFoundServiceException {
    HadoopBootstrap.INSTANCE
        .start(Component.ZOOKEEPER)
        .start(Component.HDFS)
        .start(Component.HIVEMETA)
        .start(Component.HIVESERVER2)
        .startAll();
}

@AfterClass
public static void tearDown() throws NotFoundServiceException {
    HadoopBootstrap.INSTANCE
        .stopAll();
}

##Standalone mode As said above, SolR/SolRCloud and Elasticsearch are not compatible.

For this purpose, two packages are availables:

  • hadoop-unit-standalone-solr
  • hadoop-unit-standalone-elasticsearch

Unzip hadoop-unit-standalone-<type>-<version>.tar.gz Change conf/default.properties Change conf/hadoop.properties

Start in fg with:

./bin/hadoop-unit-standalone-<type> console

Start in bg with:

./bin/hadoop-unit-standalone-<type> start

Stop with:

./bin/hadoop-unit-standalone-<type> stop

##Shell Usage Hadoop-unit can be used with common tools such as:

  • hbase shell
  • kafka-console command
  • hdfs command
  • hive shell

###Kafka-console command

  • Download and unzip kafka
  • From directory KAFKA_HOME/bin (or KAFKA_HOME/bin/windows for windows), execute command:
kafka-console-consumer --zookeeper localhost:22010 --topic topic

###HBase Shell

  • Download and unzip HBase
  • set variable HBASE_HOME
  • edit file HBASE_HOME/conf/hbase-site.xml:
<configuration>
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>127.0.0.1:22010</value>
	</property>
	<property>
		<name>zookeeper.znode.parent</name>
		<value>/hbase-unsecure</value>
	</property>
</configuration>
  • From directory HBASE_HOME/bin, execute command:
hbase shell

###HDFS command

  • From directory HADOOP_HOME/bin, execute command:
hdfs dfs -ls hdfs://localhost:20112/

###Hive Shell

  • Download and unzip Hive
  • edit file HIVE_HOME/conf/hive-site.xml:
<configuration>
	<property>
		<name>hive.metastore.uris</name>
		<value>thrift://127.0.0.1:20102</value>
	</property>
</configuration>
  • From directory HIVE_HOME/bin, execute command:
hive

#Sample See hadoop-unit-standalone/src/test/java/fr/jetoile/hadoopunit/integrationtest

#Maven Plugin usage A maven plugin is provided for integration test only.

##Embedded mode

To use it, add into the pom project stuff like that:

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>fr.jetoile.hadoop</groupId>
        <artifactId>hadoop-unit-hdfs</artifactId>
        <version>1.3</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>fr.jetoile.hadoop</groupId>
        <artifactId>hadoop-unit-hive</artifactId>
        <version>1.3</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>fr.jetoile.hadoop</groupId>
        <artifactId>hadoop-unit-client-hdfs</artifactId>
        <version>1.3</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>fr.jetoile.hadoop</groupId>
        <artifactId>hadoop-unit-client-hive</artifactId>
        <version>1.3</version>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>fr.jetoile.hadoop</groupId>
        <artifactId>hadoop-unit-client-spark</artifactId>
        <version>1.3</version>
        <scope>test</scope>
    </dependency>
</dependencies>

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <configuration>
                <excludes>
                    <exclude>**/*IntegrationTest.java</exclude>
                </excludes>
            </configuration>
            <executions>
                <execution>
                    <id>integration-test</id>
                    <goals>
                        <goal>test</goal>
                    </goals>
                    <phase>integration-test</phase>
                    <configuration>
                        <excludes>
                            <exclude>none</exclude>
                        </excludes>
                        <includes>
                            <include>**/*IntegrationTest.java</include>
                        </includes>
                    </configuration>
                </execution>
            </executions>
        </plugin>

        <plugin>
            <artifactId>hadoop-unit-maven-plugin</artifactId>
            <groupId>fr.jetoile.hadoop</groupId>
            <version>1.3</version>
            <executions>
                <execution>
                    <id>start</id>
                    <goals>
                        <goal>embedded-start</goal>
                    </goals>
                    <phase>pre-integration-test</phase>
                </execution>
            </executions>
            <configuration>
                <values>
                    <value>HDFS</value>
                    <value>ZOOKEEPER</value>
                    <value>HIVEMETA</value>
                    <value>HIVESERVER2</value>
                </values>
            </configuration>

        </plugin>

    </plugins>
</build>

Values can be:

  • HDFS
  • ZOOKEEPER
  • HIVEMETA
  • HIVESERVER2
  • SOLR
  • SOLRCLOUD
  • OOZIE
  • KAFKA
  • HBASE
  • MONGODB
  • CASSANDRA
  • ELASTICSEARCH

Here is a sample integration test:

public class HdfsBootstrapIntegrationTest {

    static private Configuration configuration;


    @BeforeClass
    public static void setup() throws BootstrapException {
        try {
            configuration = new PropertiesConfiguration("default.properties");
        } catch (ConfigurationException e) {
            throw new BootstrapException("bad config", e);
        }
    }


    @Test
    public void hdfsShouldStart() throws Exception {

        FileSystem hdfsFsHandle = HdfsUtils.INSTANCE.getFileSystem();


        FSDataOutputStream writer = hdfsFsHandle.create(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
        writer.writeUTF(configuration.getString(Config.HDFS_TEST_STRING_KEY));
        writer.close();

        // Read the file and compare to test string
        FSDataInputStream reader = hdfsFsHandle.open(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
        assertEquals(reader.readUTF(), configuration.getString(Config.HDFS_TEST_STRING_KEY));
        reader.close();
        hdfsFsHandle.close();

        URL url = new URL(
                String.format( "http://localhost:%s/webhdfs/v1?op=GETHOMEDIRECTORY&user.name=guest",
                        configuration.getInt( Config.HDFS_NAMENODE_HTTP_PORT_KEY ) ) );
        URLConnection connection = url.openConnection();
        connection.setRequestProperty( "Accept-Charset", "UTF-8" );
        BufferedReader response = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );
        String line = response.readLine();
        response.close();
        assertThat("{\"Path\":\"/user/guest\"}").isEqualTo(line);
    }
}

##Remote mode This plugin start/stop a remote local hadoop-unit-standalone.

To use it, add into the pom project stuff like that:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <configuration>
        <excludes>
            <exclude>**/*IntegrationTest.java</exclude>
        </excludes>
    </configuration>
    <executions>
        <execution>
            <id>integration-test</id>
            <goals>
                <goal>test</goal>
            </goals>
            <phase>integration-test</phase>
            <configuration>
                <excludes>
                    <exclude>none</exclude>
                </excludes>
                <includes>
                    <include>**/*IntegrationTest.java</include>
                </includes>
            </configuration>
        </execution>
    </executions>
</plugin>

<plugin>
    <artifactId>hadoop-unit-maven-plugin</artifactId>
    <groupId>fr.jetoile.hadoop</groupId>
    <version>1.3</version>
    <executions>
        <execution>
            <id>start</id>
            <goals>
                <goal>start</goal>
            </goals>
            <phase>pre-integration-test</phase>
        </execution>
    </executions>
    <configuration>
        <hadoopUnitPath>/home/khanh/tools/hadoop-unit-standalone</hadoopUnitPath>
        <exec>./hadoop-unit-standalone</exec>
        <values>
            <value>ZOOKEEPER</value>
            <value>HDFS</value>
            <value>HIVEMETA</value>
            <value>HIVESERVER2</value>            
        </values>
        <outputFile>/tmp/toto.txt</outputFile>
    </configuration>

</plugin>

<plugin>
    <artifactId>hadoop-unit-maven-plugin</artifactId>
    <groupId>fr.jetoile.hadoop</groupId>
    <version>1.3</version>
    <executions>
        <execution>
            <id>stop</id>
            <goals>
                <goal>stop</goal>
            </goals>
            <phase>post-integration-test</phase>
        </execution>
    </executions>
    <configuration>
        <hadoopUnitPath>/home/khanh/tools/hadoop-unit-standalone</hadoopUnitPath>
        <exec>./hadoop-unit-standalone</exec>
        <outputFile>/tmp/toto.txt</outputFile>
    </configuration>

</plugin>

Values can be:

  • HDFS
  • ZOOKEEPER
  • HIVEMETA
  • HIVESERVER2
  • SOLR
  • SOLRCLOUD
  • OOZIE
  • KAFKA
  • HBASE
  • MONGODB
  • CASSANDRA
  • ELASTICSEARCH

hadoopUnitPath is not mandatory but system enviroment variable HADOOP_UNIT_HOME must be defined.

exec variable is optional.

If both are set, HADOOP_UNIT_HOME override hadoopUnitPath.

Warning: This plugin will modify hadoop.properties and delete hadoop unit logs.

Here is a sample integration test:

public class HdfsBootstrapIntegrationTest {

    static private Configuration configuration;


    @BeforeClass
    public static void setup() throws BootstrapException {
        try {
            configuration = new PropertiesConfiguration("default.properties");
        } catch (ConfigurationException e) {
            throw new BootstrapException("bad config", e);
        }
    }


    @Test
    public void hdfsShouldStart() throws Exception {

        FileSystem hdfsFsHandle = HdfsUtils.INSTANCE.getFileSystem();


        FSDataOutputStream writer = hdfsFsHandle.create(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
        writer.writeUTF(configuration.getString(Config.HDFS_TEST_STRING_KEY));
        writer.close();

        // Read the file and compare to test string
        FSDataInputStream reader = hdfsFsHandle.open(new Path(configuration.getString(Config.HDFS_TEST_FILE_KEY)));
        assertEquals(reader.readUTF(), configuration.getString(Config.HDFS_TEST_STRING_KEY));
        reader.close();
        hdfsFsHandle.close();

        URL url = new URL(
                String.format( "http://localhost:%s/webhdfs/v1?op=GETHOMEDIRECTORY&user.name=guest",
                        configuration.getInt( Config.HDFS_NAMENODE_HTTP_PORT_KEY ) ) );
        URLConnection connection = url.openConnection();
        connection.setRequestProperty( "Accept-Charset", "UTF-8" );
        BufferedReader response = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );
        String line = response.readLine();
        response.close();
        assertThat("{\"Path\":\"/user/guest\"}").isEqualTo(line);
    }
}

#Component available

  • SolrCloud 5.4.1
  • Kafka
  • Hive (metastore and server2)
  • Hdfs
  • Zookeeper
  • Oozie (WIP)
  • HBase
  • MongoDB
  • Cassandra 3.4
  • ElasticSearch 5.0-alpha2

Built on:

Use:

Todo:

  • male client utils for kafka produce/consume
  • make sample with spark streaming and kafka

Issues:

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.