Validation loss not computed

Is there a reason why validation loss is not computed nor logged when the model is trained with more than one GPU?

IIRC I didn't manage to make it work for some reason, so I think I ended up running validation from a separate process - but also I didn't get to train long enough to overfit.

Could you share that validation script?
I'm using this GPT model to train a different language altogether. Therefore, having the validation loss would be of great help!

If you pass --only-validate option, then the validation loss would be computed - the only caveat is that you need to make sure you're not using multiple GPUs (e.g. limit to one gpu with CUDA_VISIBLE_DEVICES=0 environment variable)::

transformer-lm/lm/main.py

Lines 251 to 256 in fa3f529

    
           if only_validate: 
        
               if world_size != 1: 
        
                   print('multi-GPU validation is not supported yet') 
        
                   sys.exit(1) 
        
               if is_main: 
        
                   print(f'Validation loss: {get_valid_loss():.4f}')

Got it! Thanks
Should I close this issue given the actual issue of multi-GPU validation computation is still not solved?

Let's leave it open until it's supported. Thanks for report, I hope this issue will be useful in the meantime.

	if only_validate:
	if world_size != 1:
	print('multi-GPU validation is not supported yet')
	sys.exit(1)
	if is_main:
	print(f'Validation loss: {get_valid_loss():.4f}')