mhorlacher opened this issue 8 months ago · 0 comments
Hi,
When I apply the hyper parameters of the 24M model as specified in the table below, I obtain a model with only ~2M parameters. Thoughts?