karpathy/llama2.c

I found that the dim parameter affects the learning loss and n_layers affects the training speed.

win10ogod opened this issue · 0 comments

I found that the dim parameter affects the learning loss and n_layers affects the training speed.
螢幕擷取畫面 2023-10-07 184924
螢幕擷取畫面 2023-10-07 185043
It took 30 minutes.
The larger layer only had a loss of 2, but it took 3 hours.