Pinned Repositories
CNNumpy
A Numpy implementation of a Convolutional Neural Network: slow & fast (im2col/col2im).
Coursera-Labs
Coursera Lab Assignments of "Machine Learning" and "Deep Learning Specialization".
DSD-training
Implementation of DSD: Dense-Sparse-Dense Training for Deep Neural Networks in Pytorch.
GANumpy
A Numpy implementation of a Generative Adversarial Network.
GPTQ-for-RWKV
pathtracer-rust
A naive Monte Carlo Pathtracer written in Rust
Research-Paper-Summary
Summary & Implementation of Deep Learning research paper in Tensorflow/Pytorch.
Yaae
Yaae: Yet another autodiff engine (written in Numpy).
nanotron
Minimalistic large language model 3D-parallelism training
torchinfer
Deep learning inference framework [WIP]
3outeille's Repositories
3outeille/3outeille
3outeille/dotfiles-vastai
3outeille/fsdp_tp_example
minimum working example of fsdp and tp (using dmesh and process groups)
3outeille/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
3outeille/accelerated-scan
Accelerated First Order Parallel Associative Scan
3outeille/candle
Minimalist ML framework for Rust
3outeille/ChatRWKV
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
3outeille/dotfiles-tensordock
3outeille/flash_attention_numba
3outeille/fms-fsdp
Demonstrate throughput of PyTorch FSDP
3outeille/ggml
Tensor library for machine learning
3outeille/gptcore
Fast modular code to create and train cutting edge LLMs
3outeille/lighteval
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
3outeille/lightning-GPT
Train and run GPTs with Lightning
3outeille/llamafile
Distribute and run LLMs with a single file.
3outeille/mad-lab
A MAD laboratory to improve AI architecture designs 🧪
3outeille/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
3outeille/megatron-smol-cluster
Megatron-LM setup in the smol-cluster
3outeille/MS-AMP
Microsoft Automatic Mixed Precision Library
3outeille/nanotron
Minimalistic large language model 3D-parallelism training
3outeille/Open-Sora
Open-Sora, an open-source initiative dedicated to efficiently reproducing OpenAI's Sora
3outeille/pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
3outeille/RWKV-CUDA
The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )
3outeille/too-many-lists
Learn Rust by writing Entirely Too Many linked lists
3outeille/torchtitan
A native PyTorch Library for large model training
3outeille/train_your_own_sora
3outeille/transformers_demystified
3outeille/Tune-A-Video
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
3outeille/veScale
A PyTorch Native LLM Training Framework
3outeille/xv6-public
xv6 OS