lisa-lab/pylearn2

Adding training cost as default monitor channel

Opened this issue · 6 comments

We've been trying to find a way to get the training cost at a high-level in the code, but it seems that's being discarded and is not available. One way we considered would be to add this to the monitor by default, even when there's no validation dataset.

Would this be possible to add? It'd make integrating pylearn2 within Python code, and getting useful metrics back much easier.

Thanks!

cc: @ssamot

dwf commented

Usually either train_objective or train_nll is what you want, for a classification task.

Thanks, going to look into it now!

I noticed there are no channels (not even training-cost related) if there's no validation set specified for monitoring. Is that correct?

dwf commented

Monitoring channels are computed with respect to monitoring datasets. If there's no datasets specified for monitoring (doesn't necessarily need to be a validation set, though that's usually the useful thing to monitor) then very little will be monitored, typically.

Does that mean you need at least two passes to get a cost function value? Shouldn't you be able to get this from just one pass of the training set?

dwf commented

You could get something cost function value like with SGD. But you're
updating parameters on each minibatch, and so what you actually could get
is an evaluation of the cost before each step. But by the time you've
finished a pass through the dataset you've got a potentially very different
network than what you used to compute your cost estimate on the first
minibatch, and so averaging those per-minibatch costs together gets you
something that's potentially quite pessimistic and doesn't really
correspond to the training cost from any particular version of the network.

Suffice it to say, that isn't implemented at the moment. I'm not saying
it's not potentially a little bit useful, but you might get a more reliable
estimate of the objective function with a monitoring dataset that consists
of a larger-than-minibatch-but-still-smaller-than-whole-dataset sample from
your training set: e.g. the equivalent of 10-20 minibatches, but all
evaluated with the same model parameters.

I'm not saying it's not potentially a little bit useful

I understand the argument about getting a pessimistic version of the cost value (as you average over mini-batches during learning), but you can use this over training epochs to a get a somewhat "free" indication of where/if you are going anywhere. Yes, it's not perfect, but computation-wise it should be close to nill, right?