Why use normalized predicted values and non normalized target values to calculate losses?
Tangjhno1 opened this issue · 8 comments
How to explain this operation?Won't it lose accuracy?
We didn't experiment with non-normalized targets though it might improve the accuracy. We might retrain the model with this change and if it has better results, we will update the results accordingly as well as the checkpoints.
Thank you for your observation.
I'm sorry you may not understand what I mean. I saw in the source code that the target values used for calculating losses are not normalized, and the loss values given in the paper are all around 50. I don't understand the meaning of this.
I just saw in the title that you mentioned "normalized predicted values", but we don't normalize the predicted values at all. Since the targets are not normalized, it's normal that the loss would be around 50 or so since that value correspond to the non-normalized GHI. In the updated version of the paper, we will include normalized metrics such as MAPE.
The normalization of the predicted values I mentioned in the title refers to subtracting the mean and dividing the variance from the input data x.The value of the predicted result is also between 0 and 1, but the true value used to calculate the loss is not normalized. I've never seen this before, so I'm confused about the motive. Thank you very much for your reply!
I understand better your question now. The inputs are indeed normalized, but since the model is trained to match non-normalized targets, it will learn to have non-normalized values outside of [0,1] since there's no normalization occuring at the latest weight matrix which means that weight matrix can take any values such that the target is matched.
I do agree though that using normalized targets might improve the performance further since I think it will stabilize the training (the latest weight matrix values wouldn't have to be very big to compensate for input values between [0,1] and unbounded output values) but we didn't do it in our experiments.
I hope this clarifies things further and thanks for your questions!
Thank you very much for your answer!
I tried to normalize the target value, but found that the model convergence curve was not as good as before.It may be necessary to adjust the value of the hyper-parameters.
I still have one question that I don't understand.The meteorological station data mentioned in your paper contains measurements of the pressure in the station, clear sky components, Direct Normal Irradiance(DNI), and Diffuse Horizontal Irradiance (DHI).But in the experimental code, DIF, DIR, GHI, PoPoPoPo, dhi, dni, and ghi were used.The PoPoPoPo in this represents the pressure in the station.GHI(ghi) is the irradiance, what do uppercase and lowercase represent respectively?The other values are also?
We only use a subset of these channels because all stations share that same susbset of channels. Here's a breakdown of the meaning of each channel:
GHI
: GHI at the stationDIF [W/m**2]
: DHI at the stationDIR [W/m**2]
: DNI at the stationPoPoPoPo [hPa]
: Pressure at the stationdhi
: Clear-Sky DHIdni
: Clear-Sky DNIghi
: Clear-Sky GHI
Thank you for your clarification. Your research is very meaningful and has been of great help to me!