Training failed

Question

Training failed

Closed this issue 8 years ago · 16 comments

I followed the instruction and everything is OK until the training step, ie, the
/PredictionIO/bin/pio train (in run.sh under /quickstartapp/
My environment is Win7 64 bit, Vagrant virtualbox running ubuntu 14.04 VM, appreciate your advice.
the pending point is here:
2014-10-06 13:09:26,445 INFO spark.SparkContext - Job finished: count at Workflow.scala:526, took 0
.173389707 s
2014-10-06 13:09:26,446 INFO workflow.CoreWorkflow$ - DP 0 has 0 rows
2014-10-06 13:09:26,447 INFO workflow.CoreWorkflow$ - Metrics is null. Stop here
2014-10-06 13:09:26,454 INFO executor.Executor - Finished task 0.0 in stage 6.0 (TID 6). 1731 bytes
result sent to driver
2014-10-06 13:09:26,495 INFO spark.SparkContext - Starting job: collect at Workflow.scala:695
2014-10-06 13:09:26,500 INFO scheduler.DAGScheduler - Registering RDD 16 (coalesce at Workflow.scal
a:694)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Got job 7 (collect at Workflow.scala:695) wit
h 1 output partitions (allowLocal=false)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Final stage: Stage 7(collect at Workflow.scal
a:695)
2014-10-06 13:09:26,506 INFO scheduler.DAGScheduler - Parents of final stage: List(Stage 8)
2014-10-06 13:09:26,509 INFO scheduler.DAGScheduler - Missing parents: List(Stage 8)
2014-10-06 13:09:26,514 INFO scheduler.DAGScheduler - Submitting Stage 8 (MapPartitionsRDD[16] at c
oalesce at Workflow.scala:694), which has no missing parents
2014-10-06 13:09:26,525 INFO storage.MemoryStore - ensureFreeSpace(7240) called with curMem=76984,
maxMem=280248975
2014-10-06 13:09:26,529 INFO storage.MemoryStore - Block broadcast_7 stored as values in memory (es
timated size 7.1 KB, free 267.2 MB)
2014-10-06 13:09:26,535 INFO scheduler.DAGScheduler - Submitting 1 missing tasks from Stage 8 (MapP
artitionsRDD[16] at coalesce at Workflow.scala:694)
2014-10-06 13:09:26,537 INFO scheduler.TaskSchedulerImpl - Adding task set 8.0 with 1 tasks
2014-10-06 13:09:26,541 INFO scheduler.TaskSetManager - Starting task 0.0 in stage 8.0 (TID 7, loca
lhost, ANY, 1445 bytes)
2014-10-06 13:09:26,543 INFO executor.Executor - Running task 0.0 in stage 8.0 (TID 7)
2014-10-06 13:09:26,560 INFO storage.BlockManager - Found block rdd_2_0 locally

Answer 1 · 2014-10-06T13:54:06.000Z

Can you please send me the entire output of the run.sh script?

On Oct 6, 2014, at 9:40 AM, dataliven notifications@github.com wrote:

I followed the instruction and everything is OK until the training step, ie, the
/PredictionIO/bin/pio train (in run.sh under /quickstartapp/
My environment is Win7 64 bit, Vagrant virtualbox running ubuntu 14.04 VM, appreciate your advice.
the pending point is here:
2014-10-06 13:09:26,445 INFO spark.SparkContext - Job finished: count at Workflow.scala:526, took 0
.173389707 s
2014-10-06 13:09:26,446 INFO workflow.CoreWorkflow$ - DP 0 has 0 rows
2014-10-06 13:09:26,447 INFO workflow.CoreWorkflow$ - Metrics is null. Stop here
2014-10-06 13:09:26,454 INFO executor.Executor - Finished task 0.0 in stage 6.0 (TID 6). 1731 bytes
result sent to driver
2014-10-06 13:09:26,495 INFO spark.SparkContext - Starting job: collect at Workflow.scala:695
2014-10-06 13:09:26,500 INFO scheduler.DAGScheduler - Registering RDD 16 (coalesce at Workflow.scal
a:694)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Got job 7 (collect at Workflow.scala:695) wit
h 1 output partitions (allowLocal=false)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Final stage: Stage 7(collect at Workflow.scal
a:695)
2014-10-06 13:09:26,506 INFO scheduler.DAGScheduler - Parents of final stage: List(Stage 8)
2014-10-06 13:09:26,509 INFO scheduler.DAGScheduler - Missing parents: List(Stage 8)
2014-10-06 13:09:26,514 INFO scheduler.DAGScheduler - Submitting Stage 8 (MapPartitionsRDD[16] at c
oalesce at Workflow.scala:694), which has no missing parents
2014-10-06 13:09:26,525 INFO storage.MemoryStore - ensureFreeSpace(7240) called with curMem=76984,
maxMem=280248975
2014-10-06 13:09:26,529 INFO storage.MemoryStore - Block broadcast_7 stored as values in memory (es
timated size 7.1 KB, free 267.2 MB)
2014-10-06 13:09:26,535 INFO scheduler.DAGScheduler - Submitting 1 missing tasks from Stage 8 (MapP
artitionsRDD[16] at coalesce at Workflow.scala:694)
2014-10-06 13:09:26,537 INFO scheduler.TaskSchedulerImpl - Adding task set 8.0 with 1 tasks
2014-10-06 13:09:26,541 INFO scheduler.TaskSetManager - Starting task 0.0 in stage 8.0 (TID 7, loca
lhost, ANY, 1445 bytes)
2014-10-06 13:09:26,543 INFO executor.Executor - Running task 0.0 in stage 8.0 (TID 7)
2014-10-06 13:09:26,560 INFO storage.BlockManager - Found block rdd_2_0 locally

—
Reply to this email directly or view it on GitHub #5.

Answer 2 · 2014-10-07T10:41:36.000Z

sorry for late, here it is.
gist file

Answer 3 · 2014-10-14T11:47:21.000Z

I'm also encounter the same problem, can someone fix it?
I use top command to see how cpu is consumed, but it seems 1.7 percent.

Answer 4 · 2014-10-14T11:51:28.000Z

I use strace to follow the process:
strace -p 5448
Process 5448 attached - interrupt to quit
futex(0x7fa981a669d0, FUTEX_WAIT, 5466, NULL

Answer 5 · 2014-10-16T12:11:09.000Z

I do have the same problem. After 10 minutes of waiting, no response.

If i kill this Job in the Spark-Interface, all is broken an I cannot continue

Answer 6 · 2014-10-16T12:15:07.000Z

maybe the quickstart is wrong, I follow the Tutorials & Samples , Building Movie Recommendation App with Sample Code, url : http://docs.prediction.io/0.8.0/tutorials/engines/itemrec/movielens.html, it doesn't encounter the problem, may be, you can try this ,and sometime, come back to fix the problem

Answer 7 · 2014-10-16T12:19:06.000Z

You want to say, if you doens't use the quickstart Sample, then all is function without this bug?

Answer 8 · 2014-10-16T12:39:38.000Z

No, I don't mean it, I just want to say you can ignore this and sometime later to fix it after you have learned the project, if you do know nothing about the project, you are just blocked at this. OK,you can also wait for someone coming to fix it.

Answer 9 · 2014-10-16T12:40:40.000Z

Yes I'm blocked :) I'm new to PredictionIO

Answer 10 · 2014-10-16T12:41:30.000Z

the same to you.

Answer 11 · 2014-10-16T12:43:53.000Z

the question is: At the time of the former Version: Are there the same bugs or is it just in the new 0.8 release?

Answer 12 · 2014-10-16T12:55:52.000Z

I don't know, the project compromised with hadoop, hbase, spark, elasticsearch, all of them are open source and big project, and lots of people maintain projectIO, and you can all so use Google forum of prejectIO to ask questions, url:https://groups.google.com/forum/#!forum/predictionio-user

Answer 13 · 2014-11-18T16:24:05.000Z

What operating system are you on @dataliven ?

Answer 14 · 2014-11-18T17:50:23.000Z

@dataliven @mingfang

This has to do with the fact that this is running on a Linux machine. Per their documents @ http://docs.prediction.io/0.8.1/install/install-linux.html we need to run spark in standalone cluster mode.

Solution is to run /spark/sbin/start-master.sh and then update the run.sh to:

/PredictionIO/bin/pio train -- --master spark://`hostname`:7077

Good luck.

Answer 15 · 2014-11-23T17:12:18.000Z

I've upgrade to version 0.8.2 and changed the quickstart to as described here http://docs.prediction.io/0.8.2/recommendation/quickstart.html

Answer 16 · 2014-11-27T03:37:30.000Z

Please test the latest version and let me know if I can close this.