Invalid topic assignment N from word proposal
ylqfp opened this issue · 4 comments
Mem: 32GB
bin/lightlda -num_vocabs 141043 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 22 -num_blocks 33 -max_num_document 1500000 -input_dir ./splitout -data_capacity 24000
Total doc number: 32800000
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 309 MB, Alias capacity: 512 MB, Delta capacity: 230MB
[INFO] [2016-05-03 10:50:51] INFO: block = 0, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 310 MB, Alias capacity: 512 MB, Delta capacity: 231MB
[INFO] [2016-05-03 10:50:51] INFO: block = 1, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 2, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 233MB
[INFO] [2016-05-03 10:50:51] INFO: block = 3, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 4, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 5, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 6, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 7, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 8, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 310 MB, Alias capacity: 512 MB, Delta capacity: 231MB
[INFO] [2016-05-03 10:50:51] INFO: block = 9, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 10, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 233MB
[INFO] [2016-05-03 10:50:51] INFO: block = 11, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 12, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 13, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 14, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 15, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 16, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 310 MB, Alias capacity: 512 MB, Delta capacity: 230MB
[INFO] [2016-05-03 10:50:51] INFO: block = 17, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 18, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 233MB
[INFO] [2016-05-03 10:50:51] INFO: block = 19, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 20, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 21, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 22, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 23, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 24, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 309 MB, Alias capacity: 512 MB, Delta capacity: 230MB
[INFO] [2016-05-03 10:50:51] INFO: block = 25, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 311 MB, Alias capacity: 512 MB, Delta capacity: 231MB
[INFO] [2016-05-03 10:50:51] INFO: block = 26, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 312 MB, Alias capacity: 512 MB, Delta capacity: 232MB
[INFO] [2016-05-03 10:50:51] INFO: block = 27, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 28, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 29, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 235MB
[INFO] [2016-05-03 10:50:51] INFO: block = 30, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 31, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Actual Model capacity: 313 MB, Alias capacity: 512 MB, Delta capacity: 234MB
[INFO] [2016-05-03 10:50:51] INFO: block = 32, the number of slice = 2
[INFO] [2016-05-03 10:50:51] Server 0 starts: num_workers=1 endpoint=inproc://server
[INFO] [2016-05-03 10:50:51] Server 0: Worker registratrion completed: workers=1 trainers=22 servers=1
[INFO] [2016-05-03 10:50:51] Rank 0/1: Multiverso initialized successfully.
[INFO] [2016-05-03 10:51:09] Rank 0/1: Begin of configuration and initialization.
[DEBUG] [2016-05-03 11:00:50] Request params. start = 1, end = 133817
[INFO] [2016-05-03 11:00:53] Rank = 0, Iter = 0, Block = 0, Slice = 0
[DEBUG] [2016-05-03 11:00:53] Request params. start = 133818, end = 141042
[INFO] [2016-05-03 11:00:53] [FATAL] [2016-05-03 11:00:53] Invalid topic assignment 681102570 from word proposal
[FATAL] [2016-05-03 11:00:53] Rank = 0, Alias Time used: 6.48 s
Is it because my configuration error?
@feiga Thanks for your reply! Indeed, I have made some mistakes in making tf file. I'll have a try.
Additionaly, could you plz help explain the relationship between worker number/trainer number/block number?
If I've a datafile consisting of 30000000 documents, 100000 uniq words, on single machine, how to set these parameters?
Thanks!
multiple blocks in one machine is for out of core computing. If your memory can't hold all the data, then you can split into multiple blocks, and it will load and train one by one from disk. If you have enough memory size, then only one block is ok.
The number of trainer is the number of thread you used for training. It depends on the number of cores of your machine. Usually you can set the number a little less than the # of cores. Note that there is some background threads in system for communication.