graykode/gpt-2-Pytorch

Discrepancy in Parameter Size of Smallest Model

mmdalix opened this issue · 0 comments

I have been using an implementation of GPT-2 from your repository and noticed that the size of the smallest GPT-2 model available in the repository differs from the smallest model mentioned in the original paper of GPT-2.
Specifically, the size of the parameters of the smallest model in the repository is about 124M but the smallest model in original paper is 117M

I am curious to know why there is this difference