/minillm

Primary LanguagePython

minillm

Google Transformer Architecture:

The Transformer architecture, introduced by Google researchers in the paper "Attention Is All You Need" (2017), is a key innovation in deep learning for natural language processing (NLP). It is the foundational model for many advanced NLP applications.

Foundation for ChatGPT:

Models like ChatGPTdeveloped by OpenAI, are built on variations of the Transformer architecture. While ChatGPT is based on the GPT (Generative Pre-trained Transformer) model, which extends the Transformer architecture, it incorporates additional techniques and training strategies to enhance its performance.

https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Transformer is a neural network architecture that performs sequence to sequence learning:

Project Overview

To run the code you just need to have python 3.10 installed in your system. Then run following command:

make all

It will download the data and run the training.