holdenk/sparkProjectTemplate.g8

Can't run the template project

grassit opened this issue · 1 comments

I follow the instructions on https://github.com/holdenk/sparkProjectTemplate.g8, but got the following errors when running it.

Do I need to prepare input.txt? What shall I write in it?

Thanks.

$ sbt "run inputFile.txt outputFile.txt"
[info] Loading project definition from /home/t/Spark/example/templateproject/sparkproject/project
[info] Set current project to sparkProject (in build file:/home/t/Spark/example/templateproject/sparkproject/)
[info] Compiling 2 Scala sources to /home/t/Spark/example/templateproject/sparkproject/target/scala-2.11/classes...
[warn] Multiple main classes detected.  Run 'show discoveredMainClasses' to see the list

Multiple main classes detected, select one to run:

 [1] com.example.sparkProject.CountingApp
 [2] com.example.sparkProject.CountingLocalApp

Enter number: 2

[info] Running com.example.sparkProject.CountingLocalApp inputFile.txt outputFile.txt
[error] OpenJDK 64-Bit Server VM warning: Ignoring option MaxPermSize; support was removed in 8.0
[error] Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
[error] 20/03/18 14:05:24 WARN Utils: Your hostname, ocean resolves to a loopback address: 127.0.1.1; using 192.168.122.1 instead (on interface virbr0)
[error] 20/03/18 14:05:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
[error] 20/03/18 14:05:24 INFO SparkContext: Running Spark version 2.3.0
[error] WARNING: An illegal reflective access operation has occurred
[error] WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/t/.ivy2/cache/org.apache.hadoop/hadoop-auth/jars/hadoop-auth-2.6.5.jar) to method sun.security.krb5.Config.getInstance()
[error] WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
[error] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[error] WARNING: All illegal access operations will be denied in a future release
[error] 20/03/18 14:05:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[error] 20/03/18 14:05:25 INFO SparkContext: Submitted application: my awesome app
[error] 20/03/18 14:05:26 INFO SecurityManager: Changing view acls to: t
[error] 20/03/18 14:05:26 INFO SecurityManager: Changing modify acls to: t
[error] 20/03/18 14:05:26 INFO SecurityManager: Changing view acls groups to: 
[error] 20/03/18 14:05:26 INFO SecurityManager: Changing modify acls groups to: 
[error] 20/03/18 14:05:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(t); groups with view permissions: Set(); users  with modify permissions: Set(t); groups with modify permissions: Set()
[error] 20/03/18 14:05:26 INFO Utils: Successfully started service 'sparkDriver' on port 44727.
[error] 20/03/18 14:05:26 INFO SparkEnv: Registering MapOutputTracker
[error] 20/03/18 14:05:26 INFO SparkEnv: Registering BlockManagerMaster
[error] 20/03/18 14:05:26 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
[error] 20/03/18 14:05:26 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
[error] 20/03/18 14:05:26 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-d6445547-75ed-4dd3-a0b9-0cf99d75e01e
[error] 20/03/18 14:05:26 INFO MemoryStore: MemoryStore started with capacity 1048.8 MB
[error] 20/03/18 14:05:26 INFO SparkEnv: Registering OutputCommitCoordinator
[error] 20/03/18 14:05:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
[error] 20/03/18 14:05:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.122.1:4040
[error] 20/03/18 14:05:27 INFO Executor: Starting executor ID driver on host localhost
[error] 20/03/18 14:05:27 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 36907.
[error] 20/03/18 14:05:27 INFO NettyBlockTransferService: Server created on 192.168.122.1:36907
[error] 20/03/18 14:05:27 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
[error] 20/03/18 14:05:27 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.122.1, 36907, None)
[error] 20/03/18 14:05:27 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.122.1:36907 with 1048.8 MB RAM, BlockManagerId(driver, 192.168.122.1, 36907, None)
[error] 20/03/18 14:05:27 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.122.1, 36907, None)
[error] 20/03/18 14:05:27 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.122.1, 36907, None)
[error] 20/03/18 14:05:29 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.1 KB, free 1048.7 MB)
[error] 20/03/18 14:05:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.4 KB, free 1048.7 MB)
[error] 20/03/18 14:05:29 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.122.1:36907 (size: 20.4 KB, free: 1048.8 MB)
[error] 20/03/18 14:05:29 INFO SparkContext: Created broadcast 0 from textFile at CountingApp.scala:32
[error] Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/t/Spark/example/templateproject/sparkproject/inputFile.txt
[error] 	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
[error] 	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
[error] 	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
[error] 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
[error] 	at scala.Option.getOrElse(Option.scala:121)
[error] 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
[error] 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
[error] 	at scala.Option.getOrElse(Option.scala:121)
[error] 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
[error] 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
[error] 	at scala.Option.getOrElse(Option.scala:121)
[error] 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
[error] 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
[error] 	at scala.Option.getOrElse(Option.scala:121)
[error] 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
[error] 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
[error] 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
[error] 	at scala.Option.getOrElse(Option.scala:121)
[error] 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
[error] 	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
[error] 	at org.apache.spark.Partitioner$$anonfun$4.apply(Partitioner.scala:75)
[error] 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[error] 	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
[error] 	at scala.collection.immutable.List.foreach(List.scala:381)
[error] 	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
[error] 	at scala.collection.immutable.List.map(List.scala:285)
[error] 	at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:75)
[error] 	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
[error] 	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$3.apply(PairRDDFunctions.scala:326)
[error] 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error] 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[error] 	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
[error] 	at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:325)
[error] 	at com.example.sparkProject.WordCount$.withStopWordsFiltered(WordCount.scala:25)
[error] 	at com.example.sparkProject.Runner$.run(CountingApp.scala:33)
[error] 	at com.example.sparkProject.CountingLocalApp$.delayedEndpoint$com$example$sparkProject$CountingLocalApp$1(CountingApp.scala:16)
[error] 	at com.example.sparkProject.CountingLocalApp$delayedInit$body.apply(CountingApp.scala:10)
[error] 	at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
[error] 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[error] 	at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] 	at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] 	at scala.collection.immutable.List.foreach(List.scala:381)
[error] 	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
[error] 	at scala.App$class.main(App.scala:76)
[error] 	at com.example.sparkProject.CountingLocalApp$.main(CountingApp.scala:10)
[error] 	at com.example.sparkProject.CountingLocalApp.main(CountingApp.scala)
[error] 20/03/18 14:05:30 INFO SparkContext: Invoking stop() from shutdown hook
[error] 20/03/18 14:05:30 INFO SparkUI: Stopped Spark web UI at http://192.168.122.1:4040
[error] 20/03/18 14:05:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
[error] 20/03/18 14:05:30 INFO MemoryStore: MemoryStore cleared
[error] 20/03/18 14:05:30 INFO BlockManager: BlockManager stopped
[error] 20/03/18 14:05:30 INFO BlockManagerMaster: BlockManagerMaster stopped
[error] 20/03/18 14:05:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[error] 20/03/18 14:05:30 INFO SparkContext: Successfully stopped SparkContext
[error] 20/03/18 14:05:30 INFO ShutdownHookManager: Shutdown hook called
[error] 20/03/18 14:05:30 INFO ShutdownHookManager: Deleting directory /tmp/spark-edcae93e-feba-4204-91ca-8aaa4519e06a
java.lang.RuntimeException: Nonzero exit code returned from runner: 1
	at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code returned from runner: 1
[error] Total time: 72 s, completed Mar 18, 2020, 2:05:30 PM

@grassit You need to create inputFile.txt in /home/t/Spark/example/templateproject/sparkproject folder and you'll get result in /home/t/Spark/example/templateproject/sparkproject/outputFile.txt/ in format like:

(d,1)
(a,1)
(b,1)
(c,1)