how can I train the lda model by multi machines?
qinghua2016 opened this issue · 3 comments
As is said in your paper, the lda model can be trained on multi machines, but I don't find the instructions to do it. As what I know, the training command is " bin/lightlda -num_vocabs 70626 -num_topics 10 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 256 -max_num_document 997733 -input_dir example/data -out_of_core -data_capacity 800", how to change it to train on multiple machines?@feiga
As the file in example/readme.md says, the distribution running command with MPI is "Running with MPI, you just need to run mpiexec --machinefile machine_file lightlda -lightlda_arguments...". So I change the training command to " mpiexec --machinefile machine_filebin/lightlda -num_vocabs 174481 -num_topics 10 -num_iterations 100 -alpha 0.1 beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 850873input_dir example/chatdata -out_of_core -data_capacity 800". My machine_file is as follows:
192.168.11.105
192.168.11.118
My machine id is 105, and I want to train the model on both my computer 105 and another machine 118.
I run the command, it asks to input the password of machine 118, I input the right the password of machine 118, but it occurs the error as follow:
qinghua@192.168.11.105's password:
Permission denied, please try again.
qinghua@192.168.11.105's password:
Permission denied, please try again.
qinghua@192.168.11.105's password:
Permission denied (publickey,password).
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
-
not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default -
lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities. -
the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use. -
compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type. -
an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
Did I run the wrong command or if I have the other errors? @feiga
please set the mpi lib path in your ssh enviroments
Hi, have you solved? I have the same problem with you.
I have added the PATH
and LD_LIBRARY_PATH
. And two servers also can ssh to each other.