Mapreduce Example Fails
pooleja opened this issue · 4 comments
I have been trying get the Mapreduce example to run against a cluster with the following version:
Hadoop 2.2.0.2.0.6.0-101
I built the example with the following command:
mvn clean package -Phadoop22
The first error I encountered was seen on the logs for the Map task:
java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster
To get around this error I added the following config item to the application-context.xml for the config object:
yarn.application.classpath=$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
Now the job gets submitted and starts to execute properly, however it only succeeds if the task is executed on the same node that Resource Manager is running on. It is a 3 node cluster, where Node1 has Resource Manager. When any of the jobs get submitted to Node2 or Node3, it will fail with (repeating):
2014-02-06 12:05:52,135 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
If I run a sample map/reduce job outside of spring hadoop it executes as expected on any of the nodes, so I don't think it is a problem with the setup. It seems like the Spring Hadoop libraries are picking up a setting where the task thinks the Resource Manager is installed on the local Node.
Please let me know if you have any suggestions.
Could you try setting this property in your config:
yarn.resourcemanager.hostname
Set that to the hostname where the RM is running.
Yep, that was it. Thanks!
Is there any guidance from the Spring Hadoop team on the best way to ensure all the properties are correct? For example, would it make sense to copy a yarn-site.xm and mapred-site.xml to the client machine and pull it into the spring config? Or could that cause other types of problems?
I've used both ways for providing config options and the net effect is the same, so pick the one that you are more comfortable using. I tend to prefer to collect my configurations in a properties file that's part of my application.
Hi, Same problem...but coudnt solve....please help...i have a 5 node cluster with one master and 4 slaves.
I have set the ip-address of the master node [in fact even tried hard-coding] for 'yarn.resourcemanager.hostname' in the yarn-site.xml file. But even then i get the following in the log files.
ERROR:........Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
2014-03-05 20:15:50,597 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
2014-03-05 20:15:50,603 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2014-03-05 20:15:56,632 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
What could be the reason. Why is not hadoop picking up the parameter that i had set...???