/pytorch-structured-sparsity

Code for "Structured Sparsity Inducing Adaptive Optimizers for Deep Learning" in PyTorch

Primary LanguagePythonMIT LicenseMIT

Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

This is the repository for the paper

Tristan Deleu, Yoshua Bengio, Structured Sparsity Inducing Adaptive Optimizers for Deep Learning [ArXiv]

This repository contains:

  • The weighted and unweighted proximal operators for the l1/l2 and group MCP penalties
  • A modification of AdamW from Hugging Face's transformers library to include a proximal step, compatible with the structured sparsity inducing penalties in this repository.
  • The definition of the groups (channel-wise & row-wise) for some Deep Learning architectures (VGG, Resnet, BERT).