The problem of training

Question

The problem of training

Closed this issue a year ago · 4 comments

wonlee2019 commented a year ago

Is "MSE LOSS is 0.001 or 0.000 from epoch 0 and 16000/183885 (9%)" right?

My training set is 180k+, testing set is 50+.

Is this situation reasonable??

Thanks a lot!

Answer 1 · 2023-10-29T01:28:11.000Z

I think this is reasonable because only three decimal places are shown here, you can modify the code to show more digits.

Answer 2 · 2023-10-29T02:28:49.000Z

Thanks a lot!

I have another question:

I also have encountered the problem of "super().init(entropy_bottleneck_channels=N)
TypeError: init() got an unexpected keyword argument 'entropy_bottleneck_channels'", and i have seen that the solution is "the version of compressai : 1.2.0"
But i have changed the code with "super().init()" NOT "super().init(entropy_bottleneck_channels=N)", and the version of compressai is still 1.2.4.
And i found that the training time for one epoch is around 7 hours on single 4090. There is an efficient way to save time and keep the accuracy of model.
I wonder if i am wrong to deal with this problem.

Thanks a lot!!

Answer 3 · 2023-10-29T17:03:10.000Z

You are correct. In version 1.2.4, you can directly use super().init(). However, I am not quite sure if this will affect the final result, as I have not conducted experiments on this version. Intuitively, it should not have an impact.

Regarding the training time, you may try the following methods:

You can first train a model with a higher bitrate and then fine-tune it to obtain models for other bitrate points based on this one (or the pre-trained model I provided). This will speed up the convergence.
Set a smaller N to reduce the model size. In cases of low bitrate, a smaller N can also achieve satisfactory results.

Answer 4 · 2023-10-30T05:17:12.000Z

Thanks again!