carmocca

Research Engineer

@Lightning-AI Spain

Pinned Repositories

fp8-benchmark
Language:Python1 1 00
nnutils
CPU & CUDA implementation of several neural network utils
Language:C++1 0 01
probot
Language:TypeScript1 0 00
PyLaia-examples
A set of experiments using PyLaia on different datasets
Language:Shell6 3 32
UVA
UVA programming challenges
Language:Python3 2 03
PyLaia
A deep learning toolkit specialized for handwritten document analysis
Language:Python184 17 4341
lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Language:Python1.1k 35 23559
litgpt
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
Language:Python7.9k 80 664796
pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
Language:Python27.3k 246 7k3.3k
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python79.5k 1.7k 42.6k21.4k

carmocca's Repositories

carmocca/PyLaia-examples
A set of experiments using PyLaia on different datasets
Language:Shell6 3 32
carmocca/UVA
UVA programming challenges
Language:Python3 2 03
carmocca/fp8-benchmark
Language:Python1 1 00
carmocca/nnutils
CPU & CUDA implementation of several neural network utils
Language:C++1 0 01
carmocca/probot
Language:TypeScript1 0 00
carmocca/PyLaia
A deep learning toolkit for handwritten document analysis
Language:Python1 0 00
carmocca/AdventOfCode2016
My java solutions of the programming puzzles.
Language:Java0 1 00
carmocca/algorhythmHashCode
Repo to practice Google's HashCode problems
Language:Java3 0
carmocca/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Language:C++0 0
carmocca/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python0 0
carmocca/faster-pytorch-blog
Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy
Language:Python0 0
carmocca/ffcv
FFCV: Fast Forward Computer Vision (and other ML workloads!)
Language:Python0 0
carmocca/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Language:C++0 0
carmocca/lightning
Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.
Language:Python0 0
carmocca/lightning-quick-start
Language:Python0 0
carmocca/lightning-thunder
Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.
Language:Python0 0
carmocca/litdata
Blazingly fast, distributed streaming of training data from any cloud storage for training AI models
Language:Python0 0
carmocca/litgpt
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
carmocca/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Language:Python0 0
carmocca/Megatron-LM
Ongoing research training transformer models at scale
Language:Python0 0
carmocca/neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
Language:Python0 0
carmocca/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python0 0
carmocca/stable-diffusion
A latent text-to-image diffusion model
Language:Jupyter Notebook0 0
carmocca/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
Language:Jupyter Notebook0 0
carmocca/toolbox
Essential guides and programming tools in my toolbox (with focus on ML Training)
Language:Python0 0
carmocca/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python0 0
carmocca/Windifier
Wind [instrument] Classifier. Using CNNs.
Language:Python2 0
carmocca/xla
Enabling PyTorch on Google TPU
Language:C++0 0