microsoft/LightLDA

how can I train the lda model by multi machines?

qinghua2016 opened this issue · 3 comments

As is said in your paper, the lda model can be trained on multi machines, but I don't find the instructions to do it. As what I know, the training command is " bin/lightlda -num_vocabs 70626 -num_topics 10 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 256 -max_num_document 997733 -input_dir example/data -out_of_core -data_capacity 800", how to change it to train on multiple machines?@feiga

As the file in example/readme.md says, the distribution running command with MPI is "Running with MPI, you just need to run mpiexec --machinefile machine_file lightlda -lightlda_arguments...". So I change the training command to " mpiexec --machinefile machine_filebin/lightlda -num_vocabs 174481 -num_topics 10 -num_iterations 100 -alpha 0.1 beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 850873input_dir example/chatdata -out_of_core -data_capacity 800". My machine_file is as follows:
192.168.11.105
192.168.11.118
My machine id is 105, and I want to train the model on both my computer 105 and another machine 118.
I run the command, it asks to input the password of machine 118, I input the right the password of machine 118, but it occurs the error as follow:
qinghua@192.168.11.105's password:
Permission denied, please try again.
qinghua@192.168.11.105's password:
Permission denied, please try again.
qinghua@192.168.11.105's password:
Permission denied (publickey,password).

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

  • not finding the required libraries and/or binaries on
    one or more nodes. Please check your PATH and LD_LIBRARY_PATH
    settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes.
    Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
    Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required
    (e.g., on Cray). Please check your configure cmd line and consider using
    one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a
    lack of common network interfaces and/or no route found between
    them. Please check network connectivity (including firewalls
    and network routing requirements).


Did I run the wrong command or if I have the other errors? @feiga

please set the mpi lib path in your ssh enviroments

Hi, have you solved? I have the same problem with you.

I have added the PATH and LD_LIBRARY_PATH. And two servers also can ssh to each other.