Build Status Download

LinkedIn Gradle Plugin for Apache Hadoop

The LinkedIn Gradle Plugin for Apache Hadoop (which we shall refer to as simply the "Hadoop Plugin" for brevity) will help you more effectively build, test and deploy Hadoop applications.

In particular, the Plugin will help you easily work with Hadoop applications like Apache Pig and build workflows for Hadoop workflow schedulers like Azkaban and Apache Oozie.

The Plugin includes the LinkedIn Gradle DSL for Apache Hadoop (which we shall refer to as simply the "Hadoop DSL" for brevity), a language for specifying jobs and workflows for Hadoop workflow schedulers like Azkaban and Apache Oozie.

Hadoop Plugin User Guide

The Hadoop Plugin User Guide is available at [User Guide] (https://github.com/linkedin/linkedin-gradle-plugin-for-apache-hadoop/wiki/User-Guide).

Hadoop DSL Language Reference

The Hadoop DSL Language Reference is available at [Hadoop DSL Language Reference] (https://github.com/linkedin/linkedin-gradle-plugin-for-apache-hadoop/wiki/Hadoop-DSL-Language-Reference).

Getting the Hadoop Plugin

The Hadoop Plugin is now published on [plugins.gradle.org] (https://plugins.gradle.org/plugin/com.linkedin.gradle.hadoop.HadoopPlugin). Click on the link for a short snippet to add to your build.gradle file to start using the Hadoop Plugin.

Project Structure

The project structure is setup as follows:

  • hadoop-plugin: Code for the various plugins that comprise the Hadoop Plugin
  • hadoop-plugin-test: Test cases for the Hadoop Plugin
  • li-hadoop-plugin: LinkedIn-specific extensions to the Hadoop Plugin
  • li-hadoop-plugin-test: Test cases for the LinkedIn-specific extensions to the Hadoop Plugin

Although the li-hadoop-plugin code is generally specific to LinkedIn, it is included in the project to show you how to use subclassing to extend the core functionality of the Hadoop Plugin.

Building and Running Test Cases

To build the Plugin and run the test cases, run ./gradlew build from the top-level project directory.

To see all the test tasks, run ./gradlew tasks from the top-level project directory. You can run an individual test with ./gradlew test_testName. You can also run multiple tests by running ./gradlew test_testName1 ... test_testNameN.

Apache Oozie Status

Although we started on a Hadoop DSL compiler for Oozie, we did not complete it, and it is currently not in a usable form. We are not currently working on it, although it is possible we might go back and finish it in the future.

Recent News

  • April 2016 We have made a refresh of the User Guide and Hadoop DSL Language Reference Wiki pages
  • January 2016 The Hadoop Plugin is now published on plugins.gradle.org
  • November 2015 Gradle version bumped to 2.7 and the Gradle daemon enabled - tests run much, much faster
  • August 2015 Initial pull requests for Oozie versioned deployments and the Oozie Hadoop DSL compiler have been merged
  • August 2015 The Hadoop Plugin and Hadoop DSL were released on Github! See the LinkedIn Engineering Blog post for the announcement!
  • July 2015 See our talk at the Gradle Summit