Pinned Repositories
fp8-benchmark
nnutils
CPU & CUDA implementation of several neural network utils
probot
PyLaia-examples
A set of experiments using PyLaia on different datasets
UVA
UVA programming challenges
PyLaia
A deep learning toolkit specialized for handwritten document analysis
lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
litgpt
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
carmocca's Repositories
carmocca/PyLaia-examples
A set of experiments using PyLaia on different datasets
carmocca/UVA
UVA programming challenges
carmocca/fp8-benchmark
carmocca/nnutils
CPU & CUDA implementation of several neural network utils
carmocca/probot
carmocca/PyLaia
A deep learning toolkit for handwritten document analysis
carmocca/AdventOfCode2016
My java solutions of the programming puzzles.
carmocca/algorhythmHashCode
Repo to practice Google's HashCode problems
carmocca/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
carmocca/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
carmocca/faster-pytorch-blog
Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy
carmocca/ffcv
FFCV: Fast Forward Computer Vision (and other ML workloads!)
carmocca/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
carmocca/lightning
Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.
carmocca/lightning-quick-start
carmocca/lightning-thunder
Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.
carmocca/litdata
Blazingly fast, distributed streaming of training data from any cloud storage for training AI models
carmocca/litgpt
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
carmocca/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
carmocca/Megatron-LM
Ongoing research training transformer models at scale
carmocca/neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
carmocca/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
carmocca/stable-diffusion
A latent text-to-image diffusion model
carmocca/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
carmocca/toolbox
Essential guides and programming tools in my toolbox (with focus on ML Training)
carmocca/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
carmocca/Windifier
Wind [instrument] Classifier. Using CNNs.
carmocca/xla
Enabling PyTorch on Google TPU