This repository is a collection of experiments that I have done with images. So I can learn how to manipulate images and the various deep learning architectures that are used to generate images.
1. UNet 2D
A basic diffusion model for generating images.
A model that learns to represent images into discrete tokens. Can be used for image tokenization. Has training code for both the Generator and the Discriminator in PyTorch.
This is my playground for experimenting with images. I will be adding more models and experiments as I learn more about other architectures and techniques. The final goal is to add multimodality in Smol-LM
- The code is not optimized for performance. It is written in a way that is easy for me to experiment with.
- Improvements and suggestions are welcome. Feel free to open an issue or a pull request.
- Experiments with Audio is done over on AudioExpts