Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

This is the repository for the paper

Tristan Deleu, Yoshua Bengio, Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [ArXiv]

This repository contains:

The weighted and unweighted proximal operators for the l1/l2 and group MCP penalties
A modification of AdamW from Hugging Face's transformers library to include a proximal step, compatible with the structured sparsity inducing penalties in this repository.
The definition of the groups (channel-wise & row-wise) for some Deep Learning architectures (VGG, Resnet, BERT).

tristandeleu/pytorch-structured-sparsity