Huawei-Spark/Spark-SQL-on-HBase

MR fail cause by export SPARK_CLASSPATH

AllenFang opened this issue · 15 comments

Hi, this is cool stuff for spark sql with HBase, however I've some issue or problem as follow:

I've installed your product follow by document and it's all work well currently. But I write a very simple Spark application for query HBase table using newAPIHadoopRDD but got these error:

Application application_1439169262151_0037 failed 2 times due to AM Container for appattempt_1439169262151_0037_000002 exited with  exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:114)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
    at org.apache.spark.SparkContext.(SparkContext.scala:497)
    at com.hbase.HBaseQueryWithRDD$.main(HBaseQueryWithRDD.scala:18)
    at com.hbase.HBaseQueryWithRDD.main(HBaseQueryWithRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2015-08-12 10:15:53,360 INFO  [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Stopping DAGScheduler
2015-08-12 10:15:53,362 ERROR [main] spark.SparkContext (Logging.scala:logError(96)) - Error stopping SparkContext after init error.
java.lang.NullPointerException
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:150)
    at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:416)
    at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1404)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1642)
    at org.apache.spark.SparkContext.(SparkContext.scala:565)
    at com.hbase.HBaseQueryWithRDD$.main(HBaseQueryWithRDD.scala:18)
    at com.hbase.HBaseQueryWithRDD.main(HBaseQueryWithRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:114)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
    at org.apache.spark.SparkContext.(SparkContext.scala:497)
    at com.hbase.HBaseQueryWithRDD$.main(HBaseQueryWithRDD.scala:18)
    at com.hbase.HBaseQueryWithRDD.main(HBaseQueryWithRDD.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

But If I remove the spark-sql-on-hbase-1.0.0.jar from SPARK_CLASSPATH, the job will pass.

My spark version is 1.4.0 and Hadoop is 2.3

Can you check Yarn's log to see why the Application Master can't be launched, which seems to be the root cause of your exception?

HI @yzhou2001, the error message that I provided is already from yarn logs and first error message is

Application application_1439169262151_0037 failed 2 times due to AM Container for appattempt_1439169262151_0037_000002 exited with  exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
    at org.apache.hadoop.util.Shell.run(Shell.java:418)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

the message just like I post as above.

Allen,

You say hadoop-yarn v2.3.0 is included in your spark v1.4.0 shaded jar

What version of hadoop-yarn is included in the spark-sql-on-hbase-1.0.0.jar you removed from $SPARK_CLASSPATH ? Is it also v2.3.0?

And can you post your spark program (driver) to help us reproduce the problem?

Thanks,
Stan

I use this command to package the jar, but I'm not sure is it correct?

mvn clean package -Phbase,hadoop-2.3 -DskipTests

So how to package the jar with a hadoop version?

And my driver program in the below

val tableName = "XXX";
val conf = new SparkConf().setAppName("HBase_Query_with_RDD");
val sc   = new SparkContext(conf);
    
val hbaseConf = HBaseConfiguration.create();
hbaseConf.set("hbase.zookeeper.quorum","server-a1")
hbaseConf.set("hbase.zookeeper.property.clientPort","2181")
hbaseConf.set("mapreduce.framework.name", "yarn")
hbaseConf.set("yarn.resourcemanager.address", "server-a1:8032")
hbaseConf.set("yarn.resourcemanager.scheduler.address", "server-a1:8030")
hbaseConf.set("yarn.resourcemanager.resource-tracker.address", "server-a1:8031");
hbaseConf.set("yarn.resourcemanager.admin.address", "server-a1:8033")
hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName)
    
var table = new HTable(hbaseConf, tableName)
val hbaseRDD = sc.newAPIHadoopRDD(hbaseConf, classOf[TableInputFormat], 
                                classOf[ImmutableBytesWritable], 
                                classOf[Result])
    
println("contain result: " + hbaseRDD.count())
table.close()
sc.stop() 

And I give the yarn message for more detail, hope it will be helpful for you.

2015-08-13 10:25:28,434 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1439169262151_0057_000002 State change from FINAL_SAVING to FAILED
2015-08-13 10:25:28,434 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating application application_1439169262151_0057 with final state: FAILED
2015-08-13 10:25:28,434 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1439169262151_0057 State change from ACCEPTED to FINAL_SAVING
2015-08-13 10:25:28,434 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1439169262151_0057_000002 is done. finalState=FAILED
2015-08-13 10:25:28,435 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1439169262151_0057
2015-08-13 10:25:28,435 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1439169262151_0057 requests cleared
2015-08-13 10:25:28,435 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1439169262151_0057 user: user1 queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2015-08-13 10:25:28,435 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1439169262151_0057 user: user1 leaf-queue of parent: root #applications: 0
2015-08-13 10:25:28,435 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1439169262151_0057 failed 2 times due to AM Container for appattempt_1439169262151_0057_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
2015-08-13 10:25:28,435 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1439169262151_0057 State change from FINAL_SAVING to FAILED
2015-08-13 10:25:28,435 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=user1    OPERATION=Application Finished - Failed TARGET=RMAppManager     RESULT=FAILURE  DESCRIPTION=App failed with state: FAILED       PERMISSIONS=Application application_1439169262151_0057 failed 2 times due to AM Container for appattempt_1439169262151_0057_000002 exited with  exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Hi Allen,

I just wanted to make sure you did not have hadoop-yarn version conflicts. I don't think you do, and you packaged the jar correctly.

You do not need to run your Yarn container to create a NewHadoopRDD with Spark-SQL-on-HBase.

Here is a simple example that works in my environment.

The data set

1,xiaoming,16,id_1,teacherW
2,xiaoming,16,id_2,teacherW
3,xiaoming,16,id_3,teacherW
4,xiaoming,16,id_4,teacherW
5,xiaoming,16,id_5,teacherW
6,xiaoming,16,id_6,teacherW
7,xiaoming,16,id_7,teacherW
8,xiaoming,16,id_8,teacherW
9,xiaoming,16,id_9,teacherW
10,xiaoming,16,id_10,teacherW
11,xiaoming,16,id_11,teacherW
12,xiaoming,16,id_12,teacherW
13,xiaoming,16,id_13,teacherW
14,xiaoming,16,id_14,teacherW
15,xiaoming,16,id_15,teacherW
16,xiaoming,16,id_16,teacherW
17,xiaoming,16,id_17,teacherW
18,xiaoming,16,id_18,teacherW
19,xiaoming,16,id_19,teacherW
1001,lihua,20,A1000,
1002,lihua,20,A1000,

column-family='cf'
rowkey:string
columns:datatype -> a:string, b:string, c:string, d:string (col 'd' is nullable)

The driver

package org.apache.spark.sql.hbase

import org.apache.hadoop.hbase.client.{HTable, Result}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.{Cell, CellUtil}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object NewHadoopRDDExample {

  def main(args: Array[String]) {
    println("NewHadoopRDDExample")

    val sparkHome = System.getenv("SPARK_HOME")
    val tableName = "PEOPLE"
    val sparkConf = new SparkConf(true)
      .setMaster("local[2]")
      .setAppName("NewHadoopRDDExample")
      .set("spark.executor.memory", "1g")

    val sc = new SparkContext(sparkConf)

    val hbaseContext = new org.apache.spark.sql.hbase.HBaseSQLContext(sc)
    val hbaseConf = hbaseContext.sparkContext.hadoopConfiguration
    hbaseConf.set("fs.defaultFS", "hdfs://YOUR-NAMENODE:54310")
    hbaseConf.set("hbase.zookeeper.quorum", "YOUR-ZK-CNXN-STRING")
    hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName)

    var table = new HTable(hbaseConf, tableName)

    val hbaseRDD = sc.newAPIHadoopRDD(
      hbaseConf,
      classOf[TableInputFormat],
      classOf[ImmutableBytesWritable],
      classOf[Result])
    println("HBase RDD Count: " + hbaseRDD.count)


    println("\nHBase KeyValues:")
    hbaseRDD.foreach(println)

    // Have to map ImmutableBytesWritables to serializable objects before running rdd.collect
    val cellsRDD: RDD[(String, Array[String])] = hbaseRDD.map(x => x._2).map(result => {
      val rowkey = result.getRow
      val col1: Cell = result.getColumnLatestCell(Bytes.toBytes("cf"), Bytes.toBytes("a"))
      val col2: Cell = result.getColumnLatestCell(Bytes.toBytes("cf"), Bytes.toBytes("b"))
      val col3: Cell = result.getColumnLatestCell(Bytes.toBytes("cf"), Bytes.toBytes("c"))
      val col4: Cell = result.getColumnLatestCell(Bytes.toBytes("cf"), Bytes.toBytes("d"))

      val arr = new Array[String](4)
      arr(0) = (Bytes.toStringBinary(CellUtil.cloneValue(col1)))
      arr(1) = (Bytes.toStringBinary(CellUtil.cloneValue(col2)))
      arr(2) = (Bytes.toStringBinary(CellUtil.cloneValue(col3)))
      // col 'd' is nullable
      arr(3) = if (col4 != null) (Bytes.toStringBinary(CellUtil.cloneValue(col4))) else null
      (Bytes.toStringBinary(rowkey), arr)
    })

    println("\nDeserialized Rows:")
    val tuples = cellsRDD.collect
    for (i <- 0 until tuples.length) {
      print("Row: " + tuples(i)._1)
      print(" => " + tuples(i)._2.mkString(" | "))
      println
    }

    table.close()
  }
}

Sorry, I dont know what actually means about running your Yarn container to create a NewHadoopRDD with Spark-SQL-on-HBase.

Anyway if I just write a spark application to read a HDFS file and counting the rows, the error message is same.

Your driver's configuration assumes a running Yarn container:
hbaseConf.set("mapreduce.framework.name", "yarn")
hbaseConf.set("yarn.resourcemanager.address", "server-a1:8032")
hbaseConf.set("yarn.resourcemanager.scheduler.address", "server-a1:8030")
hbaseConf.set("yarn.resourcemanager.resource-tracker.address", "server-a1:8031");
hbaseConf.set("yarn.resourcemanager.admin.address", "server-a1:8033")

Can you run the driver example I posted, using Spark-SQL-on-HBase?

You will see no references to Yarn in the working example's configuration, and you should have no Yarn container start-up or connection errors because Spark-SQL-on-HBase will not try to submit a job to Yarn.

Hi @sparksburnitt, yeah your are right, if I use your sample with Spark-SQL-on-HBase, it's work. Thanks a lots. But just like I said before, a very simple spark application running on yarn with error cause by starting application master failed still exist. ;(

Hello Allen,

I was able to run your Spark (only) code with spark-sql-on-hbase-1.0.0.jar in the SPARK_CLASSPATH without a problem.

You do not need those yarn property settings in your HBase config object.

Why don't you try it again after replacing the yarn properties below with your 'fs.defaultFS' value.
/*
hbaseConf.set("mapreduce.framework.name", "yarn")
hbaseConf.set("yarn.resourcemanager.address", "server-a1:8032")
hbaseConf.set("yarn.resourcemanager.scheduler.address", "server-a1:8030")
hbaseConf.set("yarn.resourcemanager.resource-tracker.address", "server-a1:8031")
hbaseConf.set("yarn.resourcemanager.admin.address", "server-a1:8033")
*/

// Spark needs to know where your hdfs root is:
hbaseConf.set("fs.defaultFS", "hdfs://YOUR-NAMENODE:PORT")

Let me know if you still see yarn errors.

-Stan

Hi @sparksburnitt , I've already tried it but result is same. I forgot to talk you, very sorry.

Hi Allen,

I just ran the example with your yarn settings (adjusted for my environment) and still could not reproduce the errors.

How are you submitting the job?

  • Are you submitting a hadoop map reduce job from the command line, using the 'hadoop' shell script?
  • Are you submitting a spark job using spark-submit?
  • Are you running a scala script from the spark shell?
  • A spark driver executing in your IDE?

(I have been running these examples from a scala object in an IDE.)

Hi Allen,

I ran the scala script below in the spark-shell (v1.4.0), and could not reproduce the errors.

Does it work on your cluster?

...

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.{HTable, Result}
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark.{SparkConf, SparkContext}

val tableName = "XXXX_TBL"

val hbaseConf = HBaseConfiguration.create
hbaseConf.set("fs.defaultFS", "hdfs://SERVER:54310")
hbaseConf.set("hbase.zookeeper.quorum", "SERVER:2181")

hbaseConf.set("mapreduce.framework.name", "yarn")
hbaseConf.set("yarn.resourcemanager.address", "SERVER:8032")
hbaseConf.set("yarn.resourcemanager.scheduler.address", "SERVER:8030")
hbaseConf.set("yarn.resourcemanager.resource-tracker.address", "SERVER:8025")
hbaseConf.set("yarn.resourcemanager.admin.address", "SERVER:8033")

hbaseConf.set(TableInputFormat.INPUT_TABLE, tableName)

val table = new HTable(hbaseConf, tableName)

val hbaseRDD = sc.newAPIHadoopRDD(
hbaseConf,
classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

println("HBase RDD Count: " + hbaseRDD.count)
table.close

Hi @sparksburnitt, I always run my application by spark-submit and use yarn-client. But I do the following test inspired by you.

  1. I've written a very simple code with scala and package a jar

    ...
    val input = sc.parallelize(Array(1,2,3,4,5))
    println(input.count())
    sc.stop() 
    
  • Run by spark-submit with yarn-client. The result is fail, the problem is same.
  • Run by spark-submit with yarn-cluster. The result is success.
  • Run by spark-submit with local. The result is success.
  • use spark-shell to run the code
    The result is success, but I think if running on spark-shell, it always run on local, it does not to use yarn to run this job, so always does not happen the error about the starting application master

Anyway, I also run the code that you provide above, the result is ok !! But I have no idea about why the application running only on yarn-client will cause error. But I think I dont want to solve this problem temporary, because run on yarn-cluster is work, so I can keep to work. However, thanks for your help, very sorry to spent you so much times.

Hello Allen,

I was able to run your job in yarn-client mode via spark-submit.

In the driver, I changed the spark-conf's master to 'yarn-client', e.g.

val sparkConf = new SparkConf(true)
.setMaster("yarn-client")
.setAppName("SparkOnYarnExample")
.set("spark.executor.memory", "4g")

and built a 'fang.jar' containing the spark driver.

Here is the spark-submit command:

export SPARK_JAR=YR_PATH/spark-assembly-1.4.0-hadoop2.4.0.jar
export SPARK_SQL_HBASE_JAR=YR_PATH/spark-sql-on-hbase-1.0.0.jar

bin/spark-submit --class org.apache.spark.sql.hbase.SparkOnYarnExample
--master yarn-client
--jars $SPARK_SQL_HBASE_JAR
--num-executors 6
--driver-memory 4g
--executor-memory 4g
--executor-cores 6 \
/tmp/fang.jar