/GPTs-from-scratch

Primary LanguageJupyter Notebook

GPTs

This is intended to be a step by step guide on how to implement any neural network architecture between linear regression and the transformer. This repo will guide you step by step with incremental changes to the code upto complex architectures. You are free to experiment with them as much as you want and i encourage you to do that.

The following is a performance table for the architectures with training loss value, number of parameters of the model and time to run (its not proportional to no. of params because some architectures reuse weights many times).

Model Type Number of Parameters Time
Linear Regression 2 ?