GoogleCloudPlatform/oozie-to-airflow

Support uber.jar feature

potiuk opened this issue · 0 comments

We should add support for uber.jar feature of Ooozie

Supporting Uber JAR
The term uber JAR in this context refers to the Hadoop concept of the same name. In general, users package their application-specific custom classes into a JAR. Additionally, users might need a set of third-party JARs for their application. Hadoop provides a way to include an uber JAR that is comprised of the custom classes as well as the third-party JARs under the lib/ subdirectory in the JAR. This is purely a convenience feature. This allows the user to include the uber JAR during Hadoop job submission. Hadoop understands this predefined directory structure and includes all the JARs from the lib/ subdirectory of the uber JAR on to the application classpath.
When users are writing MapReduce code natively in Hadoop, this uber JAR has to be injected using the conf.setJar() call. The setJar() method in the Hadoop code is a way for the user to set a JAR for the MapReduce job. Oozie also allows this type of MapReduce jobs. To support this, the Oozie administrator has to turn on the feature through the oozie-site.xml as shown here (by default, this feature is turned off):

<configuration>
   <property>
     <name>oozie.action.mapreduce.uber.jar.enable</name>
     <value>true</value>
   </property>
</configuration>

For the workflow job, the user needs to copy the uber JAR into an HDFS location and then define the full HDFS path of the uber JAR through action configuration as shown here (when Oozie launches this MapReduce job, it injects the uber JAR on behalf of the user by calling the conf.setJar() method):

<map-reduce>
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <configuration>
     <property>
       <name>oozie.mapreduce.uber.jar</name>
       <value>${MY_HDFS_PATH_TO_UBER_JAR}/my-uber-jar.jar</value>
     </property>
   </configuration>
</map-reduce>