Huawei-Spark/Spark-SQL-on-HBase

Error on executing 'Select * from tablename'

rkiyer999 opened this issue · 1 comments

I am getting error index out of bound when i execute 'select * from table'
Please find below the details :

Hbase Table:
describe 'sales'
Table sales is ENABLED
sales
COLUMN FAMILIES DESCRIPTION
{NAME => 'sales_des', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =>
'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.1240 seconds

scan 'sales'
ROW COLUMN+CELL
0 column=sales_des:product, timestamp=1444305686288, value=pr0
0 column=sales_des:quantity, timestamp=1444311988162, value=0
0 column=sales_des:region, timestamp=1444305702221, value=reg0
0 column=sales_des:sales, timestamp=1444312378336, value=0
0 column=sales_des:tranid, timestamp=1444302264948, value=0
1 row(s) in 0.4380 seconds

Hbase Spark Sql :
CREATE TABLE sales(tranid INTEGER, product STRING, region STRING, sales INTEGER, quantity INTEGER, PRIMARY KEY (tranid)) MAPPED BY (sales, COLS=[product=sales_des.product, region=sales_des.region, sales=sales_des.sales, quantity=sales_des.quantity]);

Error :
select * from sales;
15/10/08 15:15:35 INFO hbase.HBaseSQLCliDriver: Processing select * from sales
15/10/08 15:15:35 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=sandbox.hortonworks.com:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x5a713416, quorum=sandbox.hortonworks.com:2181, baseZNode=/hbase-unsecure
15/10/08 15:15:35 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x5a713416 connecting to ZooKeeper ensemble=sandbox.hortonworks.com:2181
15/10/08 15:15:35 INFO zookeeper.ClientCnxn: Opening socket connection to server sandbox.hortonworks.com/10.0.2.15:2181. Will not attempt to authenticate using SASL (unknown error)
15/10/08 15:15:35 INFO zookeeper.ClientCnxn: Socket connection established to sandbox.hortonworks.com/10.0.2.15:2181, initiating session
15/10/08 15:15:35 INFO zookeeper.ClientCnxn: Session establishment complete on server sandbox.hortonworks.com/10.0.2.15:2181, sessionid = 0x15046c4e1230031, negotiated timeout = 40000
15/10/08 15:15:35 INFO zookeeper.ZooKeeper: Session: 0x15046c4e1230031 closed
15/10/08 15:15:35 INFO zookeeper.ClientCnxn: EventThread shut down
15/10/08 15:15:35 INFO hbase.HBaseRelation: Number of HBase regions for table sales: 1
15/10/08 15:15:35 INFO spark.SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2
15/10/08 15:15:35 INFO scheduler.DAGScheduler: Got job 6 (main at NativeMethodAccessorImpl.java:-2) with 1 output partitions (allowLocal=false)
15/10/08 15:15:35 INFO scheduler.DAGScheduler: Final stage: ResultStage 6(main at NativeMethodAccessorImpl.java:-2)
15/10/08 15:15:35 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/10/08 15:15:35 INFO scheduler.DAGScheduler: Missing parents: List()
15/10/08 15:15:35 INFO scheduler.DAGScheduler: Submitting ResultStage 6 (MapPartitionsRDD[13] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents
15/10/08 15:15:35 INFO storage.MemoryStore: ensureFreeSpace(18176) called with curMem=2931, maxMem=278302556
15/10/08 15:15:35 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 17.8 KB, free 265.4 MB)
15/10/08 15:15:35 INFO storage.MemoryStore: ensureFreeSpace(16520) called with curMem=21107, maxMem=278302556
15/10/08 15:15:36 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 16.1 KB, free 265.4 MB)
15/10/08 15:15:36 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on localhost:60580 (size: 16.1 KB, free: 265.4 MB)
15/10/08 15:15:36 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:874
15/10/08 15:15:36 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 6 (MapPartitionsRDD[13] at main at NativeMethodAccessorImpl.java:-2)
15/10/08 15:15:36 INFO scheduler.TaskSchedulerImpl: Adding task set 6.0 with 1 tasks
15/10/08 15:15:36 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, localhost, ANY, 1702 bytes)
15/10/08 15:15:36 INFO executor.Executor: Running task 0.0 in stage 6.0 (TID 6)
15/10/08 15:15:36 INFO hbase.HBasePartition: None
15/10/08 15:15:36 ERROR executor.Executor: Exception in task 0.0 in stage 6.0 (TID 6)
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155)
at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97)
at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:979)
at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/10/08 15:15:36 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 6, localhost): java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155)
at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97)
at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:979)
at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

15/10/08 15:15:36 ERROR scheduler.TaskSetManager: Task 0 in stage 6.0 failed 1 times; aborting job
15/10/08 15:15:36 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
15/10/08 15:15:36 INFO scheduler.TaskSchedulerImpl: Cancelling stage 6
15/10/08 15:15:36 INFO scheduler.DAGScheduler: ResultStage 6 (main at NativeMethodAccessorImpl.java:-2) failed in 0.233 s
15/10/08 15:15:36 INFO scheduler.DAGScheduler: Job 6 failed: main at NativeMethodAccessorImpl.java:-2, took 0.279367 s
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 6, localhost): java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.spark.sql.hbase.util.BinaryBytesUtils$$anonfun$toInt$1.apply$mcVI$sp(bytesUtils.scala:156)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.sql.hbase.util.BinaryBytesUtils$.toInt(bytesUtils.scala:155)
at org.apache.spark.sql.hbase.util.DataTypeUtils$.setRowColumnFromHBaseRawType(DataTypeUtils.scala:97)
at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:979)
at org.apache.spark.sql.hbase.HBaseRelation$$anonfun$buildRow$1.apply(HBaseRelation.scala:972)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.hbase.HBaseRelation.buildRow(HBaseRelation.scala:971)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anonfun$3.apply(HBaseSQLReaderRDD.scala:72)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:188)
at org.apache.spark.sql.hbase.HBaseSQLReaderRDD$$anon$1.next(HBaseSQLReaderRDD.scala:170)
at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$3.apply(SparkPlan.scala:143)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1765)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
astro> exit;
15/10/08 15:39:40 INFO spark.SparkContext: Invoking stop() from shutdown hook
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
15/10/08 15:39:40 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
15/10/08 15:39:40 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
15/10/08 15:39:40 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/10/08 15:39:40 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/10/08 15:39:40 INFO util.Utils: path = /tmp/spark-5e84c9ec-e1b7-4f12-a466-f035c0ca6e7b/blockmgr-1e69a927-6ecd-473f-8897-b5bfa0f4ffe3, already present as root for deletion.
15/10/08 15:39:40 INFO storage.MemoryStore: MemoryStore cleared
15/10/08 15:39:40 INFO storage.BlockManager: BlockManager stopped
15/10/08 15:39:40 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/10/08 15:39:40 INFO spark.SparkContext: Successfully stopped SparkContext
15/10/08 15:39:40 INFO util.Utils: Shutdown hook called
15/10/08 15:39:40 INFO util.Utils: Deleting directory /tmp/spark-5e84c9ec-e1b7-4f12-a466-f035c0ca6e7b
15/10/08 15:39:40 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!

@rkiyer999,

You might try adding "IN StringFormat" at the last of the "creating table" command.