mingfang/docker-predictionio

Training failed

Closed this issue · 16 comments

I followed the instruction and everything is OK until the training step, ie, the
/PredictionIO/bin/pio train (in run.sh under /quickstartapp/
My environment is Win7 64 bit, Vagrant virtualbox running ubuntu 14.04 VM, appreciate your advice.
the pending point is here:
2014-10-06 13:09:26,445 INFO spark.SparkContext - Job finished: count at Workflow.scala:526, took 0
.173389707 s
2014-10-06 13:09:26,446 INFO workflow.CoreWorkflow$ - DP 0 has 0 rows
2014-10-06 13:09:26,447 INFO workflow.CoreWorkflow$ - Metrics is null. Stop here
2014-10-06 13:09:26,454 INFO executor.Executor - Finished task 0.0 in stage 6.0 (TID 6). 1731 bytes
result sent to driver
2014-10-06 13:09:26,495 INFO spark.SparkContext - Starting job: collect at Workflow.scala:695
2014-10-06 13:09:26,500 INFO scheduler.DAGScheduler - Registering RDD 16 (coalesce at Workflow.scal
a:694)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Got job 7 (collect at Workflow.scala:695) wit
h 1 output partitions (allowLocal=false)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Final stage: Stage 7(collect at Workflow.scal
a:695)
2014-10-06 13:09:26,506 INFO scheduler.DAGScheduler - Parents of final stage: List(Stage 8)
2014-10-06 13:09:26,509 INFO scheduler.DAGScheduler - Missing parents: List(Stage 8)
2014-10-06 13:09:26,514 INFO scheduler.DAGScheduler - Submitting Stage 8 (MapPartitionsRDD[16] at c
oalesce at Workflow.scala:694), which has no missing parents
2014-10-06 13:09:26,525 INFO storage.MemoryStore - ensureFreeSpace(7240) called with curMem=76984,
maxMem=280248975
2014-10-06 13:09:26,529 INFO storage.MemoryStore - Block broadcast_7 stored as values in memory (es
timated size 7.1 KB, free 267.2 MB)
2014-10-06 13:09:26,535 INFO scheduler.DAGScheduler - Submitting 1 missing tasks from Stage 8 (MapP
artitionsRDD[16] at coalesce at Workflow.scala:694)
2014-10-06 13:09:26,537 INFO scheduler.TaskSchedulerImpl - Adding task set 8.0 with 1 tasks
2014-10-06 13:09:26,541 INFO scheduler.TaskSetManager - Starting task 0.0 in stage 8.0 (TID 7, loca
lhost, ANY, 1445 bytes)
2014-10-06 13:09:26,543 INFO executor.Executor - Running task 0.0 in stage 8.0 (TID 7)
2014-10-06 13:09:26,560 INFO storage.BlockManager - Found block rdd_2_0 locally

Can you please send me the entire output of the run.sh script?

On Oct 6, 2014, at 9:40 AM, dataliven notifications@github.com wrote:

I followed the instruction and everything is OK until the training step, ie, the
/PredictionIO/bin/pio train (in run.sh under /quickstartapp/
My environment is Win7 64 bit, Vagrant virtualbox running ubuntu 14.04 VM, appreciate your advice.
the pending point is here:
2014-10-06 13:09:26,445 INFO spark.SparkContext - Job finished: count at Workflow.scala:526, took 0
.173389707 s
2014-10-06 13:09:26,446 INFO workflow.CoreWorkflow$ - DP 0 has 0 rows
2014-10-06 13:09:26,447 INFO workflow.CoreWorkflow$ - Metrics is null. Stop here
2014-10-06 13:09:26,454 INFO executor.Executor - Finished task 0.0 in stage 6.0 (TID 6). 1731 bytes
result sent to driver
2014-10-06 13:09:26,495 INFO spark.SparkContext - Starting job: collect at Workflow.scala:695
2014-10-06 13:09:26,500 INFO scheduler.DAGScheduler - Registering RDD 16 (coalesce at Workflow.scal
a:694)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Got job 7 (collect at Workflow.scala:695) wit
h 1 output partitions (allowLocal=false)
2014-10-06 13:09:26,503 INFO scheduler.DAGScheduler - Final stage: Stage 7(collect at Workflow.scal
a:695)
2014-10-06 13:09:26,506 INFO scheduler.DAGScheduler - Parents of final stage: List(Stage 8)
2014-10-06 13:09:26,509 INFO scheduler.DAGScheduler - Missing parents: List(Stage 8)
2014-10-06 13:09:26,514 INFO scheduler.DAGScheduler - Submitting Stage 8 (MapPartitionsRDD[16] at c
oalesce at Workflow.scala:694), which has no missing parents
2014-10-06 13:09:26,525 INFO storage.MemoryStore - ensureFreeSpace(7240) called with curMem=76984,
maxMem=280248975
2014-10-06 13:09:26,529 INFO storage.MemoryStore - Block broadcast_7 stored as values in memory (es
timated size 7.1 KB, free 267.2 MB)
2014-10-06 13:09:26,535 INFO scheduler.DAGScheduler - Submitting 1 missing tasks from Stage 8 (MapP
artitionsRDD[16] at coalesce at Workflow.scala:694)
2014-10-06 13:09:26,537 INFO scheduler.TaskSchedulerImpl - Adding task set 8.0 with 1 tasks
2014-10-06 13:09:26,541 INFO scheduler.TaskSetManager - Starting task 0.0 in stage 8.0 (TID 7, loca
lhost, ANY, 1445 bytes)
2014-10-06 13:09:26,543 INFO executor.Executor - Running task 0.0 in stage 8.0 (TID 7)
2014-10-06 13:09:26,560 INFO storage.BlockManager - Found block rdd_2_0 locally


Reply to this email directly or view it on GitHub #5.

sorry for late, here it is.
gist file

I'm also encounter the same problem, can someone fix it?
I use top command to see how cpu is consumed, but it seems 1.7 percent.

I use strace to follow the process:
strace -p 5448
Process 5448 attached - interrupt to quit
futex(0x7fa981a669d0, FUTEX_WAIT, 5466, NULL

I do have the same problem. After 10 minutes of waiting, no response.

If i kill this Job in the Spark-Interface, all is broken an I cannot continue

maybe the quickstart is wrong, I follow the Tutorials & Samples , Building Movie Recommendation App with Sample Code, url : http://docs.prediction.io/0.8.0/tutorials/engines/itemrec/movielens.html, it doesn't encounter the problem, may be, you can try this ,and sometime, come back to fix the problem

You want to say, if you doens't use the quickstart Sample, then all is function without this bug?

No, I don't mean it, I just want to say you can ignore this and sometime later to fix it after you have learned the project, if you do know nothing about the project, you are just blocked at this. OK,you can also wait for someone coming to fix it.

Yes I'm blocked :) I'm new to PredictionIO

the same to you.

the question is: At the time of the former Version: Are there the same bugs or is it just in the new 0.8 release?

I don't know, the project compromised with hadoop, hbase, spark, elasticsearch, all of them are open source and big project, and lots of people maintain projectIO, and you can all so use Google forum of prejectIO to ask questions, url:https://groups.google.com/forum/#!forum/predictionio-user

What operating system are you on @dataliven ?

@dataliven @mingfang

This has to do with the fact that this is running on a Linux machine. Per their documents @ http://docs.prediction.io/0.8.1/install/install-linux.html we need to run spark in standalone cluster mode.

Solution is to run /spark/sbin/start-master.sh and then update the run.sh to:

/PredictionIO/bin/pio train -- --master spark://`hostname`:7077

Good luck.

I've upgrade to version 0.8.2 and changed the quickstart to as described here http://docs.prediction.io/0.8.2/recommendation/quickstart.html

Please test the latest version and let me know if I can close this.