dmlc/parameter_server

How to train a model on a fixed dataset using SGD?

cuihenggang opened this issue · 1 comments

Hi,

I assume the "async_sgd" app is the one to use if we are going to train a model using SGD (correct me if I'm wrong).

And if I understand correctly, the "async_sgd" app will go through the data multiple times (specified by the "max_pass_of_data" config), but the printed "loss" is actually the average loss of all these data passes (the loss of the first pass also gets averaged). Is that correct?

Since we are using a fixed dataset, we hope to train the model using the batched way. But we find the "darlin" app uses BCD instead of SGD. Is that possible to train on batched data using SGD? It will be useful if the async_sgd app could print out the loss of the current model on only one copy of the input dataset during the training.

Thanks,
Cui

mli commented

hi henggang,

the printed loss is the averaged minibatch loss since the last printing. for example,
mb1: 1, mb2: .9, then print 0.95,
mb3: .8, mb4:.7, then print 0.85

if you set max_pass_of_data=1, then the loss can be viewed as the loss on the test set.

i usually set max_pass_of_data=1 when the dataset is big.