sakserv/hadoop-mini-clusters

InJVMContainerExecutor

Closed this issue · 6 comments

When I try to use the in jvm container executor I get the following error
java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

I am running the MiniYarnCluster from within Junit tests.
It ignores the Classpath created by the DefaultContainerExecutor in the shell script.

I modified the class such that it parses the export CLASSPATH statement in the launch_container.sh and I can now go past this problem, however, MRAppMaster does a new YarnConfiguration which by default gives fs.defaultFS as file:/// and so the staging directory for the job is not longer found because it tries to search for local file system.

@mohnishkodnani Sorry for the delay, would it be possible to point me to a repo with tests that recreate this problem?

Closing for now. Please open a new ticket if you still need assistance.

I dont have a reproducible simple test case yet, my framework is complicated. But here are my observations.

  1. We start HDFS cluster.
  2. Set the fs.defaultFS to that location for our mapreduce cluster.
  3. Launch a map reduce job.
  4. We use InJVMContainerExecutor. When this InJVMContainerExecutor calls MRAppMaster's main method to run the mapreduce job. Inside that main method for MRAppMaster it does new JobConf(new YarnConfiguration())
    Now if you look at this configuration object while debugging you will see that the fs.defaultFS = file:/// which is the local file system and not the one start with our dfs cluster.
    Normally MRAppMaster probably runs under some environment where if you do that line of code, it carris forward all the settings in core-site.xml or something similar but not in this case.
    Then due to this, when MRAppMaster tries to check if the staging directory exists it fails with a FATAL error. Even though debug logs show that it does exist but on DFS cluster, while MRAppMaster ends up checking local FS.

Thanks for the follow up. In looking into MRAppMaster, it is not immediately obvious how I can fix this. main calls initAndStartAppMaster with the JobConf created from the new YarnConfiguration. This in turn calls init with this JobConf object. Let me do some additional testing here and I'll get back to you ASAP.

After a bit more testing, I believe what will work is to take your tests Configuration objects, write them out to the appropriate -site.xml files on the local filesystem, and add the path where the configurations were written to the classpath when submitting the application to YARN. This will allow MRAppMaster to pick up the configuration from the classpath in its environment. Unfortunately, there isn't a clean way for me to accomplish this in InJvmContainerExecutor itself, it relies on the configuration coming from the application master.

If you are using a Client similar to mine, take a look at Client#setupAppMasterEnv.