the device used in training
Opened this issue · 1 comments
tianlianghai commented
what device did you use in training, I use 512 per V100 16GB lead to an OOM error.
but if I use a small batch, the loss go to NaN
train_fasternet_m(){
python train_test.py -g 0,1 --num_nodes 1 -n 4 -b 1024 -e 500 \
--pin_memory --wandb_project_name fasternet \
--model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_m.yaml
}
99-WSJ commented
what device did you use in training, I use 512 per V100 16GB lead to an OOM error. but if I use a small batch, the loss go to NaN
train_fasternet_m(){ python train_test.py -g 0,1 --num_nodes 1 -n 4 -b 1024 -e 500 \ --pin_memory --wandb_project_name fasternet \ --model_ckpt_dir ./model_ckpt/$(date +'%Y%m%d_%H%M%S') --cfg cfg/fasternet_m.yaml }
hello,did you solve it? it occurs in my experiments.