/maven-archetype-hadoop

Provides a simple archetype to create MapReduce jobs with Maven.

Primary LanguageJava

Maven Archetype for Hadoop

This project is a small template to quickly create a new Maven based project that creates Hadoop MapReduce job jars.

It uses the Cloudera Maven repository to access the dependencies for Hadoop related artifacts.

Building

The process is very simple, you clone this project and create an archetype jar from it like so:

$ cd /tmp
$ git clone git@github.com:larsgeorge/maven-archetype-hadoop.git
Initialized empty Git repository in /private/tmp/maven-archetype-hadoop/.git/
remote: Counting objects: 13, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 13 (delta 2), reused 0 (delta 0)
Receiving objects: 100% (13/13), done.
Resolving deltas: 100% (2/2), done.

$ cd maven-archetype-hadoop/
$ $ mvn archetype:create-from-project
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'archetype'.
[INFO] ------------------------------------------------------------------------
[INFO] Building mapred
[INFO]    task-segment: [archetype:create-from-project] (aggregator-style)
[INFO] ------------------------------------------------------------------------
[INFO] Preparing archetype:create-from-project
[INFO] ------------------------------------------------------------------------
[INFO] Building mapred
[INFO] ------------------------------------------------------------------------
[INFO] No goals needed for project - skipping
[INFO] [archetype:create-from-project {execution: default-cli}]
[INFO] Setting default groupId: com.larsgeorge
[INFO] Setting default artifactId: mapred
[INFO] Setting default version: 1.0-SNAPSHOT
[INFO] Setting default package: com.larsgeorge
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building mapred-archetype
[INFO]    task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Copying 5 resources
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Copying 2 resources
[INFO] [archetype:jar {execution: default-jar}]
[INFO] [archetype:add-archetype-metadata {execution: default-add-archetype-metadata}]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Fri Dec 03 01:10:16 CET 2010
[INFO] Final Memory: 20M/81M
[INFO] ------------------------------------------------------------------------
[INFO] Archetype created in /private/tmp/maven-archetype-hadoop/target/generated-sources/archetype
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6 seconds
[INFO] Finished at: Fri Dec 03 01:10:16 CET 2010
[INFO] Final Memory: 17M/81M
[INFO] ------------------------------------------------------------------------

With that you have build the archetype you now need to install into your local repository:

$ cd target/generated-sources/archetype/
$ mvn install
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building mapred-archetype
[INFO]    task-segment: [install]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Copying 5 resources
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Copying 2 resources
[INFO] [archetype:jar {execution: default-jar}]
[INFO] [archetype:add-archetype-metadata {execution: default-add-archetype-metadata}]
[INFO] [archetype:integration-test {execution: default-integration-test}]
[INFO] [install:install {execution: default-install}]
[INFO] Installing /private/tmp/maven-archetype-hadoop/target/generated-sources/archetype/target/mapred-archetype-1.0-SNAPSHOT.jar to /Users/larsgeorge/.m2/repository/com/larsgeorge/mapred-archetype/1.0-SNAPSHOT/mapred-archetype-1.0-SNAPSHOT.jar
[INFO] [archetype:update-local-catalog {execution: default-update-local-catalog}]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 6 seconds
[INFO] Finished at: Fri Dec 03 01:11:34 CET 2010
[INFO] Final Memory: 21M/81M
[INFO] ------------------------------------------------------------------------

Now you can create a new project using this archetype:

$ cd /tmp
$ mvn archetype:generate -DarchetypeCatalog=local
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'archetype'.
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Default Project
[INFO]    task-segment: [archetype:generate] (aggregator-style)
[INFO] ------------------------------------------------------------------------
[INFO] Preparing archetype:generate
[INFO] No goals needed for project - skipping
[INFO] [archetype:generate {execution: default-cli}]
[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: local -> mapred-archetype (mapred-archetype)
Choose a number: : 1
Define value for property 'groupId': : com.foobar
Define value for property 'artifactId': : mapred
Define value for property 'version': 1.0-SNAPSHOT:
Define value for property 'package': com.foobar:
Confirm properties configuration:
groupId: com.foobar
artifactId: mapred
version: 1.0-SNAPSHOT
package: com.foobar
Y: Y
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1 minute 3 seconds
[INFO] Finished at: Fri Dec 03 01:13:34 CET 2010
[INFO] Final Memory: 16M/81M
[INFO] ------------------------------------------------------------------------

Looking into the directory we got:

larsgeorge@de1-app-mbp-2:/tmp$ ls -laR mapred
total 24
drwxr-xr-x   6 larsgeorge  wheel   204 Dec  3 00:43 .
drwxrwxrwt  19 root        wheel   646 Dec  3 00:43 ..
-rw-r--r--   1 larsgeorge  wheel    21 Dec  3 00:43 .gitignore
-rw-r--r--   1 larsgeorge  wheel   268 Dec  3 00:43 README.rst
-rw-r--r--   1 larsgeorge  wheel  1673 Dec  3 00:43 pom.xml
drwxr-xr-x   3 larsgeorge  wheel   102 Dec  3 00:43 src

mapred/src:
total 0
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 .
drwxr-xr-x  6 larsgeorge  wheel  204 Dec  3 00:43 ..
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 main

mapred/src/main:
total 0
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 .
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 ..
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 java

mapred/src/main/java:
total 0
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 .
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 ..
drwxr-xr-x  3 larsgeorge  wheel  102 Dec  3 00:43 com.larsgeorge

mapred/src/main/java/com.larsgeorge:
total 8
drwxr-xr-x  3 larsgeorge  wheel   102 Dec  3 00:43 .
drwxr-xr-x  3 larsgeorge  wheel   102 Dec  3 00:43 ..
-rw-r--r--  1 larsgeorge  wheel  2365 Dec  3 00:43 WordCount.java

Let's check if it compiles on its own:

$ cd mapred
$ mvn package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building mapred
[INFO]    task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /private/tmp/mapred/src/main/resources
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 1 source file to /private/tmp/mapred/target/classes
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /private/tmp/mapred/src/test/resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [surefire:test {execution: default-test}]
[INFO] No tests to run.
[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /private/tmp/mapred/target/mapred-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3 seconds
[INFO] Finished at: Fri Dec 03 01:15:37 CET 2010
[INFO] Final Memory: 25M/81M
[INFO] ------------------------------------------------------------------------

And finally we check if there is a job jar ready to be uploaded to a cluster for action:

$ ls -la target/
total 16
drwxr-xr-x  6 larsgeorge  wheel   204 Dec  3 01:15 .
drwxr-xr-x  7 larsgeorge  wheel   238 Dec  3 01:15 ..
drwxr-xr-x  3 larsgeorge  wheel   102 Dec  3 01:15 classes
drwxr-xr-x  3 larsgeorge  wheel   102 Dec  3 01:15 generated-sources
-rw-r--r--  1 larsgeorge  wheel  5380 Dec  3 01:15 mapred-1.0-SNAPSHOT.jar
drwxr-xr-x  3 larsgeorge  wheel   102 Dec  3 01:15 maven-archiver

Done!