.

Question

.

Closed this issue 7 months ago · 2 comments

.

Answer 1 · 2023-06-29T13:37:16.000Z

Hi,

thanks for reaching out!

For your first question: init_var_offset is used to control the initial variance (determined by initial_var) predicted at the start of training. In this line, self.init_var_offset is added to the variance prediction of the network. self.init_var_offset is chosen such that the returned variance is exactly initial_var if the variance prediction of the network is zero. The line you pointed out is essentially the inverse of the softplus function.

The rationale to set this value is that it can be helpful for optimization if the overall variance is already known before training. This was not used for the paper.

Regarding your second question, this seems to be an error, as the max_var is indeed set to 100. This value should not matter much, and it was set arbitrarily. The reason for having it is that it can prevent outliers with extremely high variance prediction to cause large gradients. On the sine dataset, this should not matter at all.

Feel free to ask further questions should you have any.

Answer 2 · 2023-06-30T12:11:05.000Z

Well, adding the offset results in a different loss, and thus a different gradient. So no, it does affect training (which is the whole point of doing it).