kweonwooj/papers

Compressing Word Embeddings via Deep Compositional Code Learning

Opened this issue 7 years ago · 0 comments

kweonwooj commented 7 years ago

Abstract

propose multi-codebook quantization of word embeddings
V x d word embedding is compressed into M x K multi-codebook embedding
use Gumbel-softmax trick in quantization network
Compression rate of 98% in sentiment analysis and 94~99% in machine translation w/o performance loss

Details

Introduction

Word Embedding takes up memory/storage
- 100K words with 1000 dimension == 100M embedding parameter, which is often > 95% of param in NLP tasks
follows the intuition of creating partially shared embeddings

Code Learning

Code learning tries to reconstruct original embedding by distributing information into multiple codebooks
- embedding is calculated by sum of M codebook embeddings
- training happens like auto-encoder with Gumbel-softmax trick
- below objective function to reconstruct original embedding

Experiments

Sentiment Analysis
- 75K x 300 GloVe embedding -> 32 x 16 coding scheme (512 x 300 embedding) has compression rate of 98.4% in terms of filesize
Machine Translation
- 20K x 600 embedding -> 32 x 16 coding scheme (512 x 600 embedding) has 91.5% compression rate
- you have to re-train the network with compressed embedding, while fixing the embedding

Qualitative Analysis

similar words encoded in similar fashion in GloVe
codebook population
- few spots are very famous, least popular takes at least 5%

Personal Thoughts

Compressing word embedding by 90%+ without performance loss is a great feat
this is still dense representation, may be further compressed via quantization (or not)
surprised how much disk space we were wasting

Link : https://arxiv.org/pdf/1711.01068.pdf
Authors : Shu et al. 2017