sakserv/hadoop-mini-clusters

oozie shared lib

Closed this issue · 2 comments

Hi,
I am trying to do the same where I create the LocalOozie instance to test my oozie workflows.
However, when I try to run a workflow against, it I am getting error in the java action, because it says "Could not find Oozie Share Lib".
How does your oozie server create the share lib folder.

thanks
mohnish

@mohnishkodnani - My sincere apologies for the long delay here. I finally had time to work on adding Share Lib support.

Adding Share Lib support turned out to be trickier than I had expected. Each release has different jars in the Share Lib and each Share Lib tarball is 200+mb. Simply adding each release to the project was way too much bloat.The Share Lib tarball is available in the Oozie release tarball, and can be downloaded from the HWX repos, so that is the approach I took.

I've added the resource sharelib.properties, which contains the Oozie release tarball URL for each supported release.

Several additional properties have been added to the OozieLocalServer to support the addition of Share Lib.

.setOozieHdfsShareLibDir("/tmp/oozie_share_lib")
.setOozieShareLibCreate(Boolean.TRUE)
.setOozieLocalShareLibCacheDir("share_lib_cache")
.setOoziePurgeLocalShareLibCache(Boolean.FALSE)
  • setOozieHdfsShareLibDir - final Share Lib directory in HDFS
  • setOozieShareLibCreate - Should the sharelib be created at all?
  • setOozieLocalShareLibCacheDir - Where to cache the Oozie release tarball on the local system
  • setOoziePurgeLocalShareLibCache - Should the Oozie release tarball cache directory be purged after testing completes. This is necessary to testing multiple profiles.

You can choose to cache the Oozie release tarball to avoid downloading the release each test, saving time and bandwidth. The Oozie release version to download will be automatically determined by the maven profile, downloaded, and extracted into HDFS. The HDFS path will be set in the OozieLocalServer configuration.

if (oozieShareLibCreate) {
    configuration.set("oozie.service.WorkflowAppService.system.libpath",
             oozieHdfsDefaultFs + oozieHdfsShareLibDir);
    configuration.set("use.system.libpath.for.mapreduce.and.pig.jobs", "true");
}

Since we need a HDFS cluster to store the Share Lib, you need to handle creating Share Lib on your own in your tests, like this:

// Get the HDFS FS handle
FileSystem hdfsFs = hdfsLocalCluster.getHdfsFileSystemHandle();

// Instantiate oozieShareLibUtil with the hdfs share lib dir, share lib create bool, local cache dir, purge bool, and the HDFS FS handle
OozieShareLibUtil oozieShareLibUtil = new OozieShareLibUtil(oozieLocalServer.getOozieHdfsShareLibDir(),
    oozieLocalServer.getOozieShareLibCreate(),
    oozieLocalServer.getOozieLocalShareLibCacheDir(),
    oozieLocalServer.getOoziePurgeLocalShareLibCache(), hdfsFs);

// Download, extract, and upload Share Lib to HDFS
oozieShareLibUtil.createShareLib();

I'll close this ticket once the changes are merged. Please open a new issue if you run into trouble with these changes.

Thanks!

I also wanted to know when you create a yarn cluster, where do the hadoop jars get copied so that they are in the classpath when the application is run.