Running MRMiniCluster with aegisthus
tarunsas opened this issue · 4 comments
Hi,
I want to debug my RecordReader, so trying to setup a mini MR Cluster. With this following gradle dependencies,
dependencies {
configurations.includeInJar {
transitive = false
}
includeInJar project(':aegisthus-core')
includeInJar 'org.apache.cassandra:cassandra-all:2.0.7'
includeInJar 'org.apache.pig:pig:0.11.1'
includeInJar 'org.xerial.snappy:snappy-java:1.0.4.1'
compile 'org.slf4j:slf4j-api:1.6.3'
compile 'org.apache.hadoop:hadoop-client:2.3.0-cdh5.0.1'
testCompile 'org.apache.mrunit:mrunit:0.9.0-incubating:hadoop2'
testCompile 'org.apache.hadoop:hadoop-minicluster:2.3.0-cdh5.0.1'
testCompile 'org.apache.hadoop:hadoop-test:2.3.0-mr1-cdh5.0.1'
configurations.compile.extendsFrom(configurations.includeInJar)
}
Do we need hadoop-client dependency or hadoop-core ?
I am getting problem loading JobTracker class.
2014-08-18 15:44:34,535 ERROR [Thread-95] mapred.MiniMRCluster (MiniMRCluster.java:run(122)) - Job tracker crashed
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobTracker.startTracker(Lorg/apache/hadoop/mapred/JobConf;Ljava/lang/String;)Lorg/apache/hadoop/mapred/JobTracker;
at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner$1.run(MiniMRCluster.java:117)
at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner$1.run(MiniMRCluster.java:115)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:115)
at java.lang.Thread.run(Thread.java:744)
For Hadoop 1 we need the hadoop-core dependency and for Hadoop 2 we need the hadoop-client dependency.
With the latest version of Aegisthus it looks like we explicitly specify the hadoop dependencies we need. Could you please try adding these dependencies and see if that makes a difference?
compile "org.apache.hadoop:hadoop-common:$hadoopVersion"
compile "org.apache.hadoop:hadoop-hdfs:$hadoopVersion"
compile "org.apache.hadoop:hadoop-mapreduce-client-core:$hadoopVersion"
compile "org.apache.hadoop:hadoop-mapreduce-client-jobclient:$hadoopVersion"
If you still get the error I will need more info to help because I wasn't able to reproduce it yet.
Thanks daniel for the reply.
Here is my hadoop dependencies
// include correct hadoop libraries for version
def includeHadoopLibs(project) {
project.dependencies {
compile 'org.apache.hadoop:hadoop-common:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-hdfs:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-mapreduce-client-core:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-client:2.3.0-cdh5.0.1'
compile 'org.apache.mrunit:mrunit:0.9.0-incubating:hadoop2'
compile 'org.apache.hadoop:hadoop-minicluster:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-test:2.3.0-mr1-cdh5.0.1'
}
}
Goal is to integrate MRMiniCluster with Aegisthus , so that we can do TestNG with a local cluster.
When trying to start the cluster, there is an issue with classloader.
JobTracker is loaded from "hadoop-mapreduce-client-core" instead of "hadoop-core"
2014-08-19 08:51:42,473 ERROR [Thread-95] mapred.MiniMRCluster (MiniMRCluster.java:run(122)) - Job tracker crashed
java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobTracker.startTracker(Lorg/apache/hadoop/mapred/JobConf;Ljava/lang/String;)Lorg/apache/hadoop/mapred/JobTracker;
at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner$1.run(MiniMRCluster.java:117)
at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner$1.run(MiniMRCluster.java:115)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.MiniMRCluster$JobTrackerRunner.run(MiniMRCluster.java:115)
at java.lang.Thread.run(Thread.java:744)
2014-08-19 08:51:43,466 INFO [Test worker] Conf
Complete build.gradle.
// Establish version and status
ext.githubProjectName = rootProject.name // Change if github project name is not the same as the root project's name
buildscript {
repositories {
mavenCentral()
}
apply from: file('gradle/buildscript.gradle'), to: buildscript
}
allprojects {
configurations {
includeInJar
}
repositories {
mavenCentral()
maven {
url {
"https://repository.cloudera.com/artifactory/cloudera-repos"
}
}
}
}
apply from: file('gradle/convention.gradle')
apply from: file('gradle/maven.gradle')
//apply from: file('gradle/check.gradle')
//apply from: file('gradle/license.gradle')
//apply from: file('gradle/release.gradle')
subprojects {
group = "acsi.grid" // TEMPLATE: Set to organization of project
dependencies {
testCompile 'org.testng:testng:6.1.1'
testCompile 'org.easymock:easymock:3.0'
testCompile "org.mockito:mockito-all:1.9.5"
}
}
project(':aegisthus-distcp') {
apply plugin: 'java'
apply plugin: 'eclipse'
dependencies {
configurations.includeInJar {
transitive = false
}
includeInJar project(':aegisthus-core')
includeInJar 'org.xerial.snappy:snappy-java:1.0.4.1'
includeInJar 'com.google.guava:guava:12.0'
configurations.compile.extendsFrom(configurations.includeInJar)
}
jar {
from { configurations.includeInJar.collect { it.isDirectory() ? it : zipTree(it) } }
}
}
project(':aegisthus-core') {
apply plugin: 'java'
apply plugin: 'eclipse'
dependencies {
includeInJar 'com.google.guava:guava:12.0'
includeInJar 'com.fasterxml.jackson.core:jackson-core:2.1.4'
includeInJar 'com.fasterxml.jackson.core:jackson-annotations:2.1.4'
includeInJar 'com.fasterxml.jackson.core:jackson-databind:2.1.4'
compile 'org.apache.pig:pig:0.11.1'
compile 'org.slf4j:slf4j-api:1.6.3'
includeHadoopLibs(project)
configurations.compile.extendsFrom(configurations.includeInJar)
}
jar {
from { configurations.includeInJar.collect { it.isDirectory() ? it : zipTree(it) } }
}
}
project(':aegisthus-hadoop') {
apply plugin: 'java'
apply plugin: 'eclipse'
test{
useTestNG()
beforeTest{
descriptor -> logger.lifecycle("Running test: "+descriptor)
}
}
dependencies {
configurations.includeInJar {
transitive = false
}
includeInJar project(':aegisthus-core')
includeInJar 'org.apache.cassandra:cassandra-all:2.0.7'
includeInJar 'org.apache.pig:pig:0.11.1'
includeInJar 'org.xerial.snappy:snappy-java:1.0.4.1'
compile 'org.slf4j:slf4j-api:1.6.3'
includeHadoopLibs(project)
configurations.compile.extendsFrom(configurations.includeInJar)
}
jar {
from { configurations.includeInJar.collect { it.isDirectory() ? it : zipTree(it) } }
}
}
project(':aegisthus-pig') {
apply plugin: 'java'
apply plugin: 'eclipse'
dependencies {
includeInJar project(':aegisthus-core')
compile 'org.apache.pig:pig:0.11.1'
compile 'joda-time:joda-time:1.6'
includeHadoopLibs(project)
configurations.compile.extendsFrom(configurations.includeInJar)
}
jar {
from { configurations.includeInJar.collect { it.isDirectory() ? it : zipTree(it) } }
}
}
// include correct hadoop libraries for version
def includeHadoopLibs(project) {
project.dependencies {
compile 'org.apache.hadoop:hadoop-common:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-hdfs:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-mapreduce-client-core:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-client:2.3.0-cdh5.0.1'
compile 'org.apache.mrunit:mrunit:0.9.0-incubating:hadoop2'
compile 'org.apache.hadoop:hadoop-minicluster:2.3.0-cdh5.0.1'
compile 'org.apache.hadoop:hadoop-test:2.3.0-mr1-cdh5.0.1'
}
}
I still haven't been able to reproduce this. It seems like a version conflict issue.
Here are the steps I tried:
- Pull latest version of aegisthus master.
- Replace build.gradle with your build.gradle from above.
- Change the cassandra dependency in aegisthus-hadoop to 1.2.15 (I had build errors on master when using cassandra 2.0.7 like you did above)
- Build Aegisthus
./gradlew clean build
- Download hadoop v 2.3.0
- Start the hadoop minicluster
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar minicluster
- Run aegisthus that was built in step 4 above
bin/hadoop jar /Users/dwatson/code/public_github/netflix/aegisthus/aegisthus-hadoop/build/libs/aegisthus-hadoop-0.1.2.jar com.netflix.Aegisthus -input sstables -output output
Please let me know if there is anything else I can do to help.
Thanks Daniel for the reply.
I was trying to start hadoop mini cluster with in TestNG/ unit tests. I can
start now without any issues.
Thanks again
Sriram
On Tue, Aug 19, 2014 at 7:01 PM, Daniel Watson notifications@github.com
wrote:
I still haven't been able to reproduce this. It seems like a version
conflict issue.Here are the steps I tried:
- Pull latest version of aegisthus master.
- Replace build.gradle with your build.gradle from above.
- Change the cassandra dependency in aegisthus-hadoop to 1.2.15 (I had
build errors on master when using cassandra 2.0.7 like you did above)- Build Aegisthus
./gradlew clean build
- Download hadoop v 2.3.0
- Start the hadoop minicluster
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.3.0-tests.jar minicluster
- Run aegisthus that was built in step 4 above
bin/hadoop jar /Users/dwatson/code/public_github/netflix/aegisthus/aegisthus-hadoop/build/libs/aegisthus-hadoop-0.1.2.jar com.netflix.Aegisthus -input sstables -output output
Please let me know if there is anything else I can do to help.
—
Reply to this email directly or view it on GitHub
#25 (comment).