kweonwooj/papers

Compressing Word Embeddings via Deep Compositional Code Learning

Opened this issue · 0 comments

Abstract

  • propose multi-codebook quantization of word embeddings
  • V x d word embedding is compressed into M x K multi-codebook embedding
  • use Gumbel-softmax trick in quantization network
  • Compression rate of 98% in sentiment analysis and 94~99% in machine translation w/o performance loss

Details

Introduction

  • Word Embedding takes up memory/storage
    • 100K words with 1000 dimension == 100M embedding parameter, which is often > 95% of param in NLP tasks
  • follows the intuition of creating partially shared embeddings

Code Learning

screen shot 2018-02-08 at 2 18 00 pm

  • Code learning tries to reconstruct original embedding by distributing information into multiple codebooks
    • embedding is calculated by sum of M codebook embeddings
      screen shot 2018-02-08 at 2 18 16 pm
    • training happens like auto-encoder with Gumbel-softmax trick
      screen shot 2018-02-08 at 2 18 28 pm
    • below objective function to reconstruct original embedding
      screen shot 2018-02-08 at 2 18 11 pm

Experiments

  • Sentiment Analysis
    • 75K x 300 GloVe embedding -> 32 x 16 coding scheme (512 x 300 embedding) has compression rate of 98.4% in terms of filesize
      screen shot 2018-02-08 at 2 24 39 pm
  • Machine Translation
    • 20K x 600 embedding -> 32 x 16 coding scheme (512 x 600 embedding) has 91.5% compression rate
    • you have to re-train the network with compressed embedding, while fixing the embedding
      screen shot 2018-02-08 at 2 26 45 pm

Qualitative Analysis

  • similar words encoded in similar fashion in GloVe
    screen shot 2018-02-08 at 2 27 17 pm

  • codebook population

    • few spots are very famous, least popular takes at least 5%
      screen shot 2018-02-08 at 2 27 21 pm

Personal Thoughts

  • Compressing word embedding by 90%+ without performance loss is a great feat
  • this is still dense representation, may be further compressed via quantization (or not)
  • surprised how much disk space we were wasting

Link : https://arxiv.org/pdf/1711.01068.pdf
Authors : Shu et al. 2017