Problem prediction higher age values
Closed this issue · 3 comments
Hello!
I'm attempting to train the network from scratch using the UTK dataset.
I'm discarding people labeled as younger than 16 or older than 70. The only change I've made in the original script is changing the "NUM_CLASSES" parameter to 55 in order to reflect the age range I'm working with.
Training goes well, and the MSE and MAE are consistent with yours, but when attempting to predict on UTK samples, I realize I'm not able to infer ages past a certain value. In my latest test, for example, while I can get satisfactory results for ages 16-40, I cant get any predictions to go over that (see picture attached)
.
Do you have any insight that might help me? Other than that, congratulations on the paper, It's been helping me a lot
Hm, that's strange. So, basically what's happening is that the network never (rarely) predicts labels larger than 40? I assume this could happen if only very few examples in the training set are above that age range, but based on the dataset distribution, it doesn't seem to be the case:
Just to make sure, have you checked the age distribution just in case -- to rule out that something happened during the partitioning?
Also, have you normalized the labels such that they start with 0 (i.e., subtracting 16 from it) -- vice versa, to go back from the age 0-56 to the 16-70 range, you would have to add 16 to the final predictions.
Thank you for answering!
I have managed to solve the issue by retraining the network a couple of times, and honestly, I cannot tell you what was happening since I made no changes that would impact the predictions. All I can guess is that in fact my train/test split was not properly balanced or something like that.
I can now reasonably reproduce your results, but I get slightly lower MAE values (5.65 with importance weights) than those reported. Any tips on how to achieve scores closer to yours?
Hm, that's weird but glad to hear that it works now. In general, I find it a bit frustrating that code from any of the common deep learning frameworks (TensorFlow, PyTorch, mxnet, etc.) is reproducible on GPUs -- maybe due to CUDA/cuDNN code. I talked to someone at NVIDIA a while back, and one problem with CNN training, for example, is that the algorithm for approximating the convolution is chosen during runtime and consequently varies between machines and runs. Setting cuDNN to deterministic (as I've done in all the scripts here) solved the issue for me. However, I heard that different types of GPU can still produce slightly different results.
Not sure if it's useful to you, but I attached the results from 6 runs (3x with and 3x without imp weights) when run on a different machine. On avg the results are about the same as in the paper. I also included the output prediction tensors in case you need them for some sort of analysis.
Please let me know if you have any other queries.