Compressing Word Embeddings via Deep Compositional Code Learning
Opened this issue · 0 comments
kweonwooj commented
Abstract
- propose multi-codebook quantization of word embeddings
V x d
word embedding is compressed intoM x K
multi-codebook embedding- use Gumbel-softmax trick in quantization network
- Compression rate of 98% in sentiment analysis and 94~99% in machine translation w/o performance loss
Details
Introduction
- Word Embedding takes up memory/storage
100K
words with1000
dimension==
100M
embedding parameter, which is often> 95%
of param in NLP tasks
- follows the intuition of creating partially shared embeddings
Code Learning
- Code learning tries to reconstruct original embedding by distributing information into multiple codebooks
Experiments
- Sentiment Analysis
- Machine Translation
Qualitative Analysis
-
codebook population
Personal Thoughts
- Compressing word embedding by
90%+
without performance loss is a great feat - this is still dense representation, may be further compressed via quantization (or not)
- surprised how much disk space we were
wasting
Link : https://arxiv.org/pdf/1711.01068.pdf
Authors : Shu et al. 2017