Quentin-Anthony

High-Performance Deep Learning PhD student at OSU NOWLAB

The Ohio State UniversityColumbus, OH

Pinned Repositories

cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Language:Python746 13 1438
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Language:Python7k 128 4511k
nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python88 0 06
DeepSpeedExamples
Example models using DeepSpeed
Language:Python1 0 00
magma
MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
Language:Python2 0 03
ml-engineering
Machine Learning Engineering Open Book
Language:Python1 0 00
xminGPT
Language:Python1 1 00
BlackMamba
Code repository for Black Mamba
Language:Python234 4 718
zcookbook
Training hybrid models for dummies.
Language:Python16 3 01

Quentin-Anthony's Repositories

Quentin-Anthony/magma
MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
Language:Python2 0 03
Quentin-Anthony/DeepSpeedExamples
Example models using DeepSpeed
Language:Python1 0 00
Quentin-Anthony/ml-engineering
Machine Learning Engineering Open Book
Language:Python1 0 00
Quentin-Anthony/xminGPT
Language:Python1 1 00
Quentin-Anthony/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python0 0 00
Quentin-Anthony/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python0 0 01
Quentin-Anthony/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and MXNet.
Language:C++0 0 00
Quentin-Anthony/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python0 0 00
Quentin-Anthony/Awesome-LLMs-on-device
Awesome LLMs on Device: A Comprehensive Survey
Quentin-Anthony/awesome-open-source-lms
Friends of OLMo and their links.
Quentin-Anthony/aws-neuron-reference-for-megatron-lm
Language:Python0 0
Quentin-Anthony/BERT-PyTorch
BERT for Distributed PyTorch + AMP Training
Language:Python0 0
Quentin-Anthony/DL-SR
Tensorflow/keras implementation for image transformation from low-resolution (LR) image to super-resolved one, including single wide-field (WF) image super-resolution prediction and SIM reconstruction.
Language:Python0 0
Quentin-Anthony/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Language:Python0 0
Quentin-Anthony/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0
Quentin-Anthony/gpu-questions
Language:Jupyter Notebook
Quentin-Anthony/grace
GRACE - GRAdient ComprEssion for distributed deep learning
Language:Python0 0
Quentin-Anthony/HierarchicalKV
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of Merlin-KV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
Language:Cuda0 0
Quentin-Anthony/ior
IOR and mdtest
Language:C0 0
Quentin-Anthony/Megatron-DeepSpeed-MS
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python0 0
Quentin-Anthony/minGPT
minGPT in JAX
Language:Jupyter Notebook0 0
Quentin-Anthony/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python0 0
Quentin-Anthony/NeMo
NeMo: a toolkit for conversational AI
Language:Python0 0
Quentin-Anthony/Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.
Language:Python0 0
Quentin-Anthony/pythia
Language:Jupyter Notebook0 0
Quentin-Anthony/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++0 0
Quentin-Anthony/Quentin-Anthony.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics
Language:JavaScript0 0
Quentin-Anthony/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Language:Python0 0
Quentin-Anthony/SwinIR
SwinIR: Image Restoration Using Swin Transformer
Language:Python0 0
Quentin-Anthony/transformers
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
Language:Python0 0