Implementation of a sparse CNN inference accelerator with compressed activation and weight memory using CUDA. Code optimized for GTX 980 card.
Primary LanguageC++