TripleReader/TripleWriter example fails
Closed this issue · 1 comments
earthquakesan commented
When I have hdfs://namenode:8020/usr/hue/rdf.nt as input:
app_1 | Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 5: hdfs:
app_1 | at org.apache.hadoop.fs.Path.initialize(Path.java:205)
app_1 | at org.apache.hadoop.fs.Path.<init>(Path.java:171)
app_1 | at org.apache.hadoop.fs.Path.<init>(Path.java:93)
app_1 | at org.apache.hadoop.fs.Globber.glob(Globber.java:211)
app_1 | at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1676)
app_1 | at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
app_1 | at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
app_1 | at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
app_1 | at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
app_1 | at scala.Option.getOrElse(Option.scala:121)
app_1 | at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
app_1 | at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
app_1 | at scala.Option.getOrElse(Option.scala:121)
app_1 | at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
app_1 | at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
app_1 | at scala.Option.getOrElse(Option.scala:121)
app_1 | at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1333)
app_1 | at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
app_1 | at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
app_1 | at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
app_1 | at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
app_1 | at net.sansa_stack.examples.spark.rdf.TripleReader$.main(TripleReader.scala:42)
app_1 | at net.sansa_stack.examples.spark.rdf.TripleReader.main(TripleReader.scala)
app_1 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
app_1 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
app_1 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
app_1 | at java.lang.reflect.Method.invoke(Method.java:498)
app_1 | at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
app_1 | at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
app_1 | at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
app_1 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
app_1 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
app_1 | Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 5: hdfs:
app_1 | at java.net.URI$Parser.fail(URI.java:2848)
app_1 | at java.net.URI$Parser.failExpecting(URI.java:2854)
app_1 | at java.net.URI$Parser.parse(URI.java:3057)
app_1 | at java.net.URI.<init>(URI.java:746)
app_1 | at org.apache.hadoop.fs.Path.initialize(Path.java:202)
app_1 | ... 38 more
app_1 | 17/05/09 13:25:31 INFO spark.SparkContext: Invoking stop() from shutdown hook
Using local file as input get me a bit further:
app_1 | 17/05/09 13:29:53 INFO executor.Executor: Adding file:/tmp/spark-e183856c-413e-4154-a887-810cb784a3ba/userFiles-e91abb7b-a606-4f99-9219-002749c9c078/sansa-examples-spark-2016-12.jar to class loader
app_1 | 17/05/09 13:29:53 INFO rdd.HadoopRDD: Input split: file:/rdf/rdf.nt:0+8392
app_1 | 17/05/09 13:29:53 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
app_1 | 17/05/09 13:29:53 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
app_1 | 17/05/09 13:29:53 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
app_1 | 17/05/09 13:29:53 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
app_1 | 17/05/09 13:29:53 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
app_1 | 17/05/09 13:29:54 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
app_1 | java.lang.ExceptionInInitializerError
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:27)
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:26)
app_1 | at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
app_1 | at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
app_1 | at scala.collection.Iterator$class.foreach(Iterator.scala:893)
app_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
app_1 | at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
app_1 | at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
app_1 | at scala.collection.AbstractIterator.to(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
app_1 | at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
app_1 | at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
app_1 | at org.apache.spark.scheduler.Task.run(Task.scala:99)
app_1 | at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
app_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
app_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
app_1 | at java.lang.Thread.run(Thread.java:745)
app_1 | Caused by: java.lang.NullPointerException
app_1 | at org.apache.jena.tdb.sys.EnvTDB.processGlobalSystemProperties(EnvTDB.java:33)
app_1 | at org.apache.jena.tdb.TDB.init(TDB.java:248)
app_1 | at org.apache.jena.tdb.sys.InitTDB.start(InitTDB.java:29)
app_1 | at org.apache.jena.system.JenaSystem.lambda$init$1(JenaSystem.java:111)
app_1 | at java.util.ArrayList.forEach(ArrayList.java:1249)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:186)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:163)
app_1 | at org.apache.jena.system.JenaSystem.init(JenaSystem.java:109)
app_1 | at org.apache.jena.riot.RDFDataMgr.<clinit>(RDFDataMgr.java:81)
app_1 | ... 25 more
app_1 | 17/05/09 13:29:54 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.ExceptionInInitializerError
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:27)
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:26)
app_1 | at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
app_1 | at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
app_1 | at scala.collection.Iterator$class.foreach(Iterator.scala:893)
app_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
app_1 | at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
app_1 | at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
app_1 | at scala.collection.AbstractIterator.to(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
app_1 | at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
app_1 | at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
app_1 | at org.apache.spark.scheduler.Task.run(Task.scala:99)
app_1 | at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
app_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
app_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
app_1 | at java.lang.Thread.run(Thread.java:745)
app_1 | Caused by: java.lang.NullPointerException
app_1 | at org.apache.jena.tdb.sys.EnvTDB.processGlobalSystemProperties(EnvTDB.java:33)
app_1 | at org.apache.jena.tdb.TDB.init(TDB.java:248)
app_1 | at org.apache.jena.tdb.sys.InitTDB.start(InitTDB.java:29)
app_1 | at org.apache.jena.system.JenaSystem.lambda$init$1(JenaSystem.java:111)
app_1 | at java.util.ArrayList.forEach(ArrayList.java:1249)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:186)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:163)
app_1 | at org.apache.jena.system.JenaSystem.init(JenaSystem.java:109)
app_1 | at org.apache.jena.riot.RDFDataMgr.<clinit>(RDFDataMgr.java:81)
app_1 | ... 25 more
app_1 |
app_1 | 17/05/09 13:29:54 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
app_1 | 17/05/09 13:29:54 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
app_1 | 17/05/09 13:29:54 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
app_1 | 17/05/09 13:29:54 INFO scheduler.DAGScheduler: ResultStage 0 (take at TripleReader.scala:40) failed in 3.923 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.ExceptionInInitializerError
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:27)
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:26)
app_1 | at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
app_1 | at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
app_1 | at scala.collection.Iterator$class.foreach(Iterator.scala:893)
app_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
app_1 | at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
app_1 | at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
app_1 | at scala.collection.AbstractIterator.to(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
app_1 | at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
app_1 | at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
app_1 | at org.apache.spark.scheduler.Task.run(Task.scala:99)
app_1 | at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
app_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
app_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
app_1 | at java.lang.Thread.run(Thread.java:745)
app_1 | Caused by: java.lang.NullPointerException
app_1 | at org.apache.jena.tdb.sys.EnvTDB.processGlobalSystemProperties(EnvTDB.java:33)
app_1 | at org.apache.jena.tdb.TDB.init(TDB.java:248)
app_1 | at org.apache.jena.tdb.sys.InitTDB.start(InitTDB.java:29)
app_1 | at org.apache.jena.system.JenaSystem.lambda$init$1(JenaSystem.java:111)
app_1 | at java.util.ArrayList.forEach(ArrayList.java:1249)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:186)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:163)
app_1 | at org.apache.jena.system.JenaSystem.init(JenaSystem.java:109)
app_1 | at org.apache.jena.riot.RDFDataMgr.<clinit>(RDFDataMgr.java:81)
app_1 | ... 25 more
app_1 |
app_1 | Driver stacktrace:
app_1 | 17/05/09 13:29:54 INFO scheduler.DAGScheduler: Job 0 failed: take at TripleReader.scala:40, took 3.991583 s
app_1 | Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.ExceptionInInitializerError
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:27)
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:26)
app_1 | at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
app_1 | at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
app_1 | at scala.collection.Iterator$class.foreach(Iterator.scala:893)
app_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
app_1 | at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
app_1 | at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
app_1 | at scala.collection.AbstractIterator.to(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
app_1 | at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
app_1 | at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
app_1 | at org.apache.spark.scheduler.Task.run(Task.scala:99)
app_1 | at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
app_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
app_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
app_1 | at java.lang.Thread.run(Thread.java:745)
app_1 | Caused by: java.lang.NullPointerException
app_1 | at org.apache.jena.tdb.sys.EnvTDB.processGlobalSystemProperties(EnvTDB.java:33)
app_1 | at org.apache.jena.tdb.TDB.init(TDB.java:248)
app_1 | at org.apache.jena.tdb.sys.InitTDB.start(InitTDB.java:29)
app_1 | at org.apache.jena.system.JenaSystem.lambda$init$1(JenaSystem.java:111)
app_1 | at java.util.ArrayList.forEach(ArrayList.java:1249)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:186)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:163)
app_1 | at org.apache.jena.system.JenaSystem.init(JenaSystem.java:109)
app_1 | at org.apache.jena.riot.RDFDataMgr.<clinit>(RDFDataMgr.java:81)
app_1 | ... 25 more
app_1 |
app_1 | Driver stacktrace:
app_1 | at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
app_1 | at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
app_1 | at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
app_1 | at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
app_1 | at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
app_1 | at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
app_1 | at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
app_1 | at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
app_1 | at scala.Option.foreach(Option.scala:257)
app_1 | at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
app_1 | at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
app_1 | at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
app_1 | at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
app_1 | at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
app_1 | at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
app_1 | at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
app_1 | at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
app_1 | at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
app_1 | at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
app_1 | at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
app_1 | at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
app_1 | at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
app_1 | at net.sansa_stack.examples.spark.rdf.TripleReader$.main(TripleReader.scala:40)
app_1 | at net.sansa_stack.examples.spark.rdf.TripleReader.main(TripleReader.scala)
app_1 | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
app_1 | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
app_1 | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
app_1 | at java.lang.reflect.Method.invoke(Method.java:498)
app_1 | at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
app_1 | at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
app_1 | at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
app_1 | at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
app_1 | at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
app_1 | Caused by: java.lang.ExceptionInInitializerError
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:27)
app_1 | at net.sansa_stack.rdf.spark.io.NTripleReader$$anonfun$load$1.apply(NTripleReader.scala:26)
app_1 | at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
app_1 | at scala.collection.Iterator$$anon$10.next(Iterator.scala:393)
app_1 | at scala.collection.Iterator$class.foreach(Iterator.scala:893)
app_1 | at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
app_1 | at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
app_1 | at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
app_1 | at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
app_1 | at scala.collection.AbstractIterator.to(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
app_1 | at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
app_1 | at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
app_1 | at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.rdd.RDD$$anonfun$take$1$$anonfun$29.apply(RDD.scala:1354)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
app_1 | at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
app_1 | at org.apache.spark.scheduler.Task.run(Task.scala:99)
app_1 | at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
app_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
app_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
app_1 | at java.lang.Thread.run(Thread.java:745)
app_1 | Caused by: java.lang.NullPointerException
app_1 | at org.apache.jena.tdb.sys.EnvTDB.processGlobalSystemProperties(EnvTDB.java:33)
app_1 | at org.apache.jena.tdb.TDB.init(TDB.java:248)
app_1 | at org.apache.jena.tdb.sys.InitTDB.start(InitTDB.java:29)
app_1 | at org.apache.jena.system.JenaSystem.lambda$init$1(JenaSystem.java:111)
app_1 | at java.util.ArrayList.forEach(ArrayList.java:1249)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:186)
app_1 | at org.apache.jena.system.JenaSystem.forEach(JenaSystem.java:163)
app_1 | at org.apache.jena.system.JenaSystem.init(JenaSystem.java:109)
app_1 | at org.apache.jena.riot.RDFDataMgr.<clinit>(RDFDataMgr.java:81)
app_1 | ... 25 more
LorenzBuehmann commented
There have been two problems:
- a bug in the RDF layer when loading from HDFS
- an issue with Maven Shade plugin when building the Jar package
Has been solved with the latest commit.