jcjohnson/torch-rnn

Loss increases gradually

binary-person opened this issue · 1 comments

As I go into the 70000 iteration, the loss and val_loss seem to increase then decrease training on a 2.3G dataset running the command:

th -input_h5 data/all.h5 -input_json data/all.json -model_type lstm -num_layers 3 -rnn_size 512 -batch_size 100 -seq_length 100 -print_every 1 -checkpoint_every 10000 -reset_iterations 0 -max_epochs 10

Here's a log.txt file of the entire training:
log-file.txt

There are multiple things that can cause this, in my experience. Your learning rate might be too high, or you might have to decrease faster for large datasets. For example, this is the script I use for my 230MB dataset (using my torch-rnn fork with extra features):

BASECMD='th train.lua -input_h5 data/combined-latest.h5 -input_json data/combined-latest.json -gpu 0 -gpu_opt -2 -low_mem_dropout 0 -dropout 0.10 -shuffle_data 1 -zoneout 0.01'
MODEL='-seq_offset 1 -model_type gridgru -wordvec_size 1024 -rnn_size 2048 -num_layers 4'
CPNAME='cv/combined-20190212'
CPINT=10214
CMD="$BASECMD $MODEL -checkpoint_every $CPINT"
export CUDA_VISIBLE_DEVICES=1,2
mkdir -p $CPNAME

$CMD -batch_size 128 -seq_length 256  -max_epochs 6  -learning_rate 4e-4   -lr_decay_every 2 -lr_decay_factor 0.5 -checkpoint_name $CPNAME/a -print_every 250
$CMD -batch_size 64  -seq_length 512  -max_epochs 10 -learning_rate 5e-5   -lr_decay_every 2 -lr_decay_factor 0.7 -checkpoint_name $CPNAME/b -print_every 250 -init_from $CPNAME/a_$(($CPINT*6)).t7  -reset_iterations 0
$CMD -batch_size 32  -seq_length 1024 -max_epochs 14 -learning_rate 2.5e-5 -lr_decay_every 2 -lr_decay_factor 0.7 -checkpoint_name $CPNAME/c -print_every 250 -init_from $CPNAME/b_$(($CPINT*10)).t7  -reset_iterations 0
$CMD -batch_size 16  -seq_length 2048 -max_epochs 16 -learning_rate 1.2e-5 -lr_decay_every 1 -lr_decay_factor 0.5 -checkpoint_name $CPNAME/d -print_every 250 -init_from $CPNAME/c_$(($CPINT*14)).t7 -reset_iterations 0

It also could be caused by not randomly ordering the sequences during training, this is also implemented in my fork and it made my model train more smoothly.