deanwampler/spark-scala-tutorial

Problem with filtering in Intro1.sc

Closed this issue · 4 comments

I've got problem with running code from Intro1.sc in my repl. Stacktrace below. Any ideas how to deal with that?

scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext

scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._

scala> // The val keyword declares a read-only variable: "value".

scala> val sc = new SparkContext("local", "Intro (1)")
14/05/29 13:45:27 WARN Utils: Your hostname, work resolves to a loopback address: 127.0.1.1; using 192.168.122.1 instead (on interface virbr0)
14/05/29 13:45:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@61a6dde4

scala>

scala> // Load the King James Version of the Bible, then convert

scala> // each line to lower case, creating an RDD.

scala> val input = sc.textFile("data/kjvdat.txt").map(line => line.toLowerCase)
input: org.apache.spark.rdd.RDD[String] = MappedRDD[2] at map at <console>:12

scala>

scala> // Cache the data in memory for faster, repeated retrieval.

scala> input.cache
res1: org.apache.spark.rdd.RDD[String] = MappedRDD[2] at map at <console>:12

scala>

scala> // Find all verses that mention "sin".

scala> val count = sins.count()         // How many sins?
14/05/29 13:46:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/05/29 13:46:01 WARN LoadSnappy: Snappy native library not loaded
14/05/29 13:46:01 ERROR Executor: Exception in task ID 0
java.lang.ClassNotFoundException: $line10.$read$$iw$$iw$$iw$$iw$$anonfun$1
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
        at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
        at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
        at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/05/29 13:46:01 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/05/29 13:46:01 WARN TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: $line10.$read$$iw$$iw$$iw$$iw$$anonfun$1
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
        at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
        at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
        at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/05/29 13:46:01 ERROR TaskSetManager: Task 0.0:0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: $anonfun$1)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I just tried it and it seems to work for me. I went into tutorial folder. typed "sbt", then typed "console", then typed ":paste", the pasted the code, typed in "control D" and it processed fine in the REPL.

Piotr,

Did you use the console in sbt like Vincent mentioned or the scala command
line? If that doesn't work, I noticed this log message:

scala> val sc = new SparkContext("local", "Intro (1)")
14/05/29 13:45:27 WARN Utils: Your hostname, work resolves to a
loopback address: 127.0.1.1; using 192.168.122.1 instead (on interface
virbr0)
14/05/29 13:45:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
to another address

I'm not sure that's the culprit, but the stack trace indicates trouble
while in the JRE networking code.

What OS are you using? This is what I have in my /etc/hosts on my Mac:

127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
fe80::1%lo0 localhost

Finally, can you run any of the other exercises? I suspect all of them will
throw the same error.

dean

Dean Wampler, Ph.D.
Typesafe
"Functional Programming for Java Developers",
"Programming Scala", and
"Programming Hive" - all from O'Reilly
twitter: @deanwampler, @chicagoscala
http://typesafe.com
http://polyglotprogramming.com

On Sun, Jul 27, 2014 at 1:38 AM, Vincent Ohprecio notifications@github.com
wrote:

I just tried it and it seems to work for me. I went into tutorial folder.
typed "sbt", then typed "console", then typed ":paste", the pasted the
code, typed in "control D" and it processed fine in the REPL.


Reply to this email directly or view it on GitHub
#1 (comment)
.

Piotr, okay if I close this or are you still having problems with it? Thanks.

You may close it. I haven't got access to workstation where this error
occurs anymore. Thx.

On 14 August 2014 15:32, Dean Wampler notifications@github.com wrote:

Piotr, okay if I close this or are you still having problems with it?
Thanks.


Reply to this email directly or view it on GitHub
#1 (comment)
.