hadoop-mini-clusters provides an easy way to test Hadoop projects directly in your IDE, without the need for a full blown development cluster or container orchestration. It allows the user to debug with the full power of the IDE. It provides a consistent API around the existing Mini Clusters across the ecosystem, eliminating the tedious task of learning the nuances of each project's approach.
The project structure changed with 0.1.0. Each mini cluster now resides in a module of its own. See the module names below.
- hadoop-mini-clusters-hdfs - Mini HDFS Cluster
- hadoop-mini-clusters-yarn - Mini YARN Cluster (no MR)
- hadoop-mini-clusters-mapreduce - Mini MapReduce Cluster
- hadoop-mini-clusters-hbase - Mini HBase Cluster
- hadoop-mini-clusters-zookeeper - Curator based Local Cluster
- hadoop-mini-clusters-hiveserver2 - Local HiveServer2 instance
- hadoop-mini-clusters-hivemetastore - Derby backed HiveMetaStore
- hadoop-mini-clusters-storm - Storm LocalCluster
- hadoop-mini-clusters-kafka - Local Kafka Broker
- hadoop-mini-clusters-oozie - Local Oozie Server - Thanks again Vladimir
- hadoop-mini-clusters-mongodb - I know... not Hadoop
- hadoop-mini-clusters-activemq - Thanks Vladimir Zlatkin!
- hadoop-mini-clusters-hyperscaledb - For testing various databases
- hadoop-mini-clusters-knox - Local Knox Gateway
- hadoop-mini-clusters-kdc - Local Key Distribution Center (KDC)
Tests are included to show how to configure and use each of the mini clusters. See the *IntegrationTest classes.
- Maven Central - latest release
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters</artifactId>
<version>0.1.14-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-common</artifactId>
<version>0.1.14-SNAPSHOT</version>
</dependency>
Multiple versions of HDP are available. The current list is:
- HDP 2.6.2.0 (default)
- HDP 2.6.1.0
- HDP 2.6.0.3
- HDP 2.5.3.0
- HDP 2.5.0.0
- HDP 2.4.2.0
- HDP 2.4.0.0
- HDP 2.3.4.0
- HDP 2.3.2.0
- HDP 2.3.0.0
To use a different profiles, add the profile name to your maven build:
mvn test -P2.3.0.0
Note that backwards compatibility is not guarenteed.
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-hdfs</artifactId>
<version>0.1.13</version>
</dependency>
HdfsLocalCluster hdfsLocalCluster = new HdfsLocalCluster.Builder()
.setHdfsNamenodePort(12345)
.setHdfsNamenodeHttpPort(12341)
.setHdfsTempDir("embedded_hdfs")
.setHdfsNumDatanodes(1)
.setHdfsEnablePermissions(false)
.setHdfsFormat(true)
.setHdfsEnableRunningUserAsProxyUser(true)
.setHdfsConfig(new Configuration())
.build();
hdfsLocalCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-yarn</artifactId>
<version>0.1.13</version>
</dependency>
YarnLocalCluster yarnLocalCluster = new YarnLocalCluster.Builder()
.setNumNodeManagers(1)
.setNumLocalDirs(Integer.parseInt(1)
.setNumLogDirs(Integer.parseInt(1)
.setResourceManagerAddress("localhost:37001")
.setResourceManagerHostname("localhost")
.setResourceManagerSchedulerAddress("localhost:37002")
.setResourceManagerResourceTrackerAddress("localhost:37003")
.setResourceManagerWebappAddress("localhost:37004")
.setUseInJvmContainerExecutor(false)
.setConfig(new Configuration())
.build();
yarnLocalCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-mapreduce</artifactId>
<version>0.1.13</version>
</dependency>
MRLocalCluster mrLocalCluster = new MRLocalCluster.Builder()
.setNumNodeManagers(1)
.setJobHistoryAddress("localhost:37005")
.setResourceManagerAddress("localhost:37001")
.setResourceManagerHostname("localhost")
.setResourceManagerSchedulerAddress("localhost:37002")
.setResourceManagerResourceTrackerAddress("localhost:37003")
.setResourceManagerWebappAddress("localhost:37004")
.setUseInJvmContainerExecutor(false)
.setConfig(new Configuration())
.build();
mrLocalCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-hbase</artifactId>
<version>0.1.13</version>
</dependency>
HbaseLocalCluster hbaseLocalCluster = new HbaseLocalCluster.Builder()
.setHbaseMasterPort(25111)
.setHbaseMasterInfoPort(-1)
.setNumRegionServers(1)
.setHbaseRootDir("embedded_hbase")
.setZookeeperPort(12345)
.setZookeeperConnectionString("localhost:12345")
.setZookeeperZnodeParent("/hbase-unsecure")
.setHbaseWalReplicationEnabled(false)
.setHbaseConfiguration(new Configuration())
.activeRestGateway()
.setHbaseRestHost("localhost")
.setHbaseRestPort(28000)
.setHbaseRestReadOnly(false)
.setHbaseRestThreadMax(100)
.setHbaseRestThreadMin(2)
.build()
.build();
hbaseLocalCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-zookeeper</artifactId>
<version>0.1.13</version>
</dependency>
ZookeeperLocalCluster zookeeperLocalCluster = new ZookeeperLocalCluster.Builder()
.setPort(12345)
.setTempDir("embedded_zookeeper")
.setZookeeperConnectionString("localhost:12345")
.setMaxClientCnxns(60)
.setElectionPort(20001)
.setQuorumPort(20002)
.setDeleteDataDirectoryOnClose(false)
.setServerId(1)
.setTickTime(2000)
.build();
zookeeperLocalCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-hiveserver2</artifactId>
<version>0.1.13</version>
</dependency>
HiveLocalServer2 hiveLocalServer2 = new HiveLocalServer2.Builder()
.setHiveServer2Hostname("localhost")
.setHiveServer2Port(12348)
.setHiveMetastoreHostname("localhost")
.setHiveMetastorePort(12347)
.setHiveMetastoreDerbyDbDir("metastore_db")
.setHiveScratchDir("hive_scratch_dir")
.setHiveWarehouseDir("warehouse_dir")
.setHiveConf(new HiveConf())
.setZookeeperConnectionString("localhost:12345")
.build();
hiveLocalServer2.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-hivemetastore</artifactId>
<version>0.1.13</version>
</dependency>
HiveLocalMetaStore hiveLocalMetaStore = new HiveLocalMetaStore.Builder()
.setHiveMetastoreHostname("localhost")
.setHiveMetastorePort(12347)
.setHiveMetastoreDerbyDbDir("metastore_db")
.setHiveScratchDir("hive_scratch_dir")
.setHiveWarehouseDir("warehouse_dir")
.setHiveConf(new HiveConf())
.build();
hiveLocalMetaStore.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-storm</artifactId>
<version>0.1.13</version>
</dependency>
StormLocalCluster stormLocalCluster = new StormLocalCluster.Builder()
.setZookeeperHost("localhost")
.setZookeeperPort(12345)
.setEnableDebug(true)
.setNumWorkers(1)
.setStormConfig(new Config())
.build();
stormLocalCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-kafka</artifactId>
<version>0.1.13</version>
</dependency>
KafkaLocalBroker kafkaLocalBroker = new KafkaLocalBroker.Builder()
.setKafkaHostname("localhost")
.setKafkaPort(11111)
.setKafkaBrokerId(0)
.setKafkaProperties(new Properties())
.setKafkaTempDir("embedded_kafka")
.setZookeeperConnectionString("localhost:12345")
.build();
kafkaLocalBroker.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-oozie</artifactId>
<version>0.1.13</version>
</dependency>
OozieLocalServer oozieLocalServer = new OozieLocalServer.Builder()
.setOozieTestDir("embedded_oozie")
.setOozieHomeDir("oozie_home")
.setOozieUsername(System.getProperty("user.name"))
.setOozieGroupname("testgroup")
.setOozieYarnResourceManagerAddress("localhost")
.setOozieHdfsDefaultFs("hdfs://localhost:8020/")
.setOozieConf(new Configuration())
.setOozieHdfsShareLibDir("/tmp/oozie_share_lib")
.setOozieShareLibCreate(Boolean.TRUE)
.setOozieLocalShareLibCacheDir("share_lib_cache")
.setOoziePurgeLocalShareLibCache(Boolean.FALSE)
.setOozieShareLibFrameworks(
Lists.newArrayList(Framework.MAPREDUCE_STREAMING, Framework.OOZIE))
.build();
OozieShareLibUtil oozieShareLibUtil = new OozieShareLibUtil(
oozieLocalServer.getOozieHdfsShareLibDir(),
oozieLocalServer.getOozieShareLibCreate(),
oozieLocalServer.getOozieLocalShareLibCacheDir(),
oozieLocalServer.getOoziePurgeLocalShareLibCache(),
hdfsLocalCluster.getHdfsFileSystemHandle(),
oozieLocalServer.getOozieShareLibFrameworks());
oozieShareLibUtil.createShareLib();
oozieLocalServer.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-mongodb</artifactId>
<version>0.1.13</version>
</dependency>
MongodbLocalServer mongodbLocalServer = new MongodbLocalServer.Builder()
.setIp("127.0.0.1")
.setPort(11112)
.build();
mongodbLocalServer.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-activemq</artifactId>
<version>0.1.13</version>
</dependency>
ActivemqLocalBroker amq = new ActivemqLocalBroker.Builder()
.setHostName("localhost")
.setPort(11113)
.setQueueName("defaultQueue")
.setStoreDir("activemq-data")
.setUriPrefix("vm://")
.setUriPostfix("?create=false")
.build();
amq.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-hyperscaledb</artifactId>
<version>0.1.13</version>
</dependency>
hsqldbLocalServer = new HsqldbLocalServer.Builder()
.setHsqldbHostName("127.0.0.1")
.setHsqldbPort("44111")
.setHsqldbTempDir("embedded_hsqldb")
.setHsqldbDatabaseName("testdb")
.setHsqldbCompatibilityMode("mysql")
.setHsqldbJdbcDriver("org.hsqldb.jdbc.JDBCDriver")
.setHsqldbJdbcConnectionStringPrefix("jdbc:hsqldb:hsql://")
.build();
hsqldbLocalServer.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-knox</artifactId>
<version>0.1.13</version>
</dependency>
KnoxLocalCluster knoxCluster = new KnoxLocalCluster.Builder()
.setPort(8888)
.setPath("gateway")
.setHomeDir("embedded_knox")
.setCluster("mycluster")
.setTopology(XMLDoc.newDocument(true)
.addRoot("topology")
.addTag("gateway")
.addTag("provider")
.addTag("role").addText("authentication")
.addTag("enabled").addText("false")
.gotoParent()
.addTag("provider")
.addTag("role").addText("identity-assertion")
.addTag("enabled").addText("false")
.gotoParent()
.gotoParent()
.addTag("service")
.addTag("role").addText("NAMENODE")
.addTag("url").addText("hdfs://localhost:8020")
.gotoParent()
.addTag("service")
.addTag("role").addText("WEBHDFS")
.addTag("url").addText("http://localhost:50070/webhdfs")
.gotoRoot().toString())
.build();
knoxCluster.start();
<dependency>
<groupId>com.github.sakserv</groupId>
<artifactId>hadoop-mini-clusters-kdc</artifactId>
<version>0.1.14-SNAPSHOT</version>
</dependency>
KdcLocalCluster kdcLocalCluster = new KdcLocalCluster.Builder()
.setPort(34340)
.setHost("127.0.0.1")
.setBaseDir("embedded_kdc")
.setOrgDomain("ORG")
.setOrgName("ACME")
.setPrincipals("hdfs,hbase,yarn,oozie,oozie_user,zookeeper,storm,mapreduce,HTTP".split(","))
.setKrbInstance("127.0.0.1")
.setInstance("DefaultKrbServer")
.setTransport("TCP")
.setMaxTicketLifetime(86400000)
.setMaxRenewableLifetime(604800000)
.setDebug(false)
.build();
kdcLocalCluster.start();
Find how to integrate KDC with HDFS, Zookeeper or HBase in the tests under hadoop-mini-clusters-kdc/src/test/java/com/github/sakserv/minicluster/impl
To change the defaults used to construct the mini clusters, modify src/main/java/resources/default.properties as needed.
If you desire running the full test suite from Intellij, make sure Fork Mode is set to method (Run -> Edit Configurations -> fork mode)
YarnLocalCluster now supports Oleg Z's InJvmContainerExecutor. See Oleg Z's Github for more.