mrsergazinov/gluformer

about the variance estimation

orangeniux opened this issue · 1 comments

For the Variance layer, should the second activate function be relu? cause tanh results the possibility of negative variance. Or the step is not about the sampling variance, please advise, thanks a lot!

You are possibly referring to the Variance layer in the gluformer/variance.py file, which indeed returns a value in [-10,10], because the last activation function is tanh and then the output is manually scaled by multiplying by 10.

This is not variance but rather log-variance, see the error computation in model_train.py (line 143) and the associated process_batch function in utils/train.py. Hence, the variance indeed is restricted to the range [exp(-10), exp(10)] to avoid computational instability during training.