Liuhong99/Sophia

Is it applicable for any loss function?

Closed this issue · 2 comments

Hi, thanks for the great work. I noticed the general usage is for categorical logits. Does it only work with categorical logits? I am working on a regression task with MSE using LLM, can I use it and how to?

+1

If you have a well-specified probabilistic model, then the GNB estimator will work as is. For example, suppose your probabilistic model for $y|x$ is $N(\mu_{\theta}(x), \sigma_\theta^2(x))$ where $\mu_\theta$ and $\sigma_\theta$ are neural nets (which is a common practice in DRL), then you can just use the same algorithm as is (at least in theory). This also works if the std of y|x is known

However, if you simply have a MSE loss, but the standard deviation of y|x is not specified, then maybe some tricks are needed. We can only speculate without any theoretical or empirical evidence: maybe you can first estimate the std of y|x, and then sample Gaussian labels from the model using the output of the model as the mean, and the estimated std as the std. Hope this makes sense.