/PapersReimplementations

Personal short implementations of Machine Learning papers

Primary LanguageJupyter NotebookMIT LicenseMIT

Description

Personal re-implementations of known Machine Learning architectures, layers, algorithms and more. Re-implementations might be simplified and approximate. The goal is learning / familiarizing / practicing with the core concepts 🙂.

Packages

ddpm

Implementation of the "Denoising Diffusion Probabilistic Models" paper. I use MNIST and FashionMNIST dataset as toy examples. The model used is a custom U-Net like architecture with the use of positional embeddings. Pre-trained models for both datasets (20 epochs only) are provided in the when using Git Large File System. Check out the Blog for a step-by-step explanation.

gpt

Decoder-only implementation of a GPT model from "Attention is all you need" paper. I simply train the transformer on 1984, the novel by George Orwell, based on next-character prediction. Samples generated by the model are stored into a file.

Samples obtained out of a small transformer (depth 6, width 384, 6 heads) can be found under /gpt/generated.txt. Here are a few:

################ SAMPLE 6 ################
Winston's heart brostless, then she got up with
a trays that dark was governed upon. They were little because of what they
could give him a day afraid of the Ninth Three-Year Plenty went out. The
doors had looked into his arms. On the evenings he murmuries

################ SAMPLE 16 ################
g.

'But this iden't boy,' he said, lave he said. 'The Party, didn't mean
it got into the barmty of Newspeak.'--and then he safer anything victim round
anything, as I'm reading to be anything but. They can't be take they can't
shoe, an year ago:

'Ah, five

################ SAMPLE 18 ################
Gothern's earlier stations and elusions
against planned the steps. The next purpose were interested. The same of
the tongue of the Revolution is were reality. The programmans that he had
stopped involving the Spies intercounted as phrase.

nf

Implementation of the "Density estimation using Real NVP" paper. I re-implement and use 30 Affine Coupling layers to create a normalizing flow that can generate MNIST digits. The generated digits come with associated log probabilities, which tell which images are the most likely according to the model. Here's a glance at the (not so impeccable) final result:

ppo

Implementation of the famous "Proximal Policy Optimization Algorithms" paper. I implement the simple PPO algorithm from scratch in pytorch using weights & biases for logging the loss terms and the average reward through iterations.

vit

Implementation of the "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" paper. The MNIST dataset is used as a toy example for classification task. Blog .