/Virtues-of-Sparsity

Attempted implementation and experiment with continuous sparsification and sparse representaions (with k-winner activation along with duty cycles and everything) with Transformers on Named Entity Recognition (CoNLL 2003).

Primary LanguagePython

Virtues-of-Sparsity

Attempted implementation and experiment with continuous sparsification and sparse representaions (with k-winner activation along with duty cycles and everything) with Transformers on Named Entity Recognition (CoNLL 2003).

Run this for training with continuous sparsification. Run this for training with sparse representaions with the method from here Run this for training baseline Transformer.

As for results, baseline Transformer seems to still perform the best, but we get about the same result with continuous sparsification with halved parameters (continuous sparsification trims down parameter count to almost half the original)