ml-efficiency
There are 3 repositories under ml-efficiency topic.
mosaicml/composer
Supercharge Your Model Training
stsxxx/MoDM
MoDM is a cache-aware, hybrid serving system that accelerates image generation by dynamically combining small and large diffusion models for efficient, high-quality output.
MyDarapy/SmolLM-experiments-with-grouped-query-attention
(Unofficial) building Hugging Face SmolLM-blazingly fast and small language model with PyTorch implementation of grouped query attention (GQA)