bentrevett/pytorch-sentiment-analysis

Upgraded sentiment Analysis notebook - dropout in embedding layer?

TalitaGroetzinger opened this issue · 4 comments

Hi!

Sorry for asking another question :).

I am a bit confused because I think that you are applying dropout on the embedding layer in the Upgraded sentiment analysis notebook:

embedded = self.dropout(self.embedding(text))

I thought that you shouldn't apply dropout to the embedding layers / input layers. You also mention that in the notebook: "note: never use dropout in the input or in the output layer (text or fc in this case)".

You should not use dropout on the actual input tokens, text, but you can - and should - use it on the embeddings obtained from those tokens, self.embedding(text).

This allows you to learn more robust embeddings.

aaah, okay, so does that mean that you do not randomly drop embeddings, but you randomly drop properties of each word embedding? It's difficult for me to imagine what exactly happens if you apply dropout on the obtained embeddings (and why this is different from dropping tokens before feeding it into the embedding layer).

Your first statement is correct. When you perform dropout on, say, 64-dimensional embeddings it will not cause all of 64-dimensional embeddings of a percentage of the tokens to be set to zero. It will instead cause some percentage of those 64 dimensions to be set to zero for each token.

Thus, you don't drop out a whole token you drop out the the embedding vector.

An example:

import torch
import torch.nn as nn

x = torch.LongTensor([[1,2,3,4,5,6,7,8,9,0]])
embedding = nn.Embedding(10, 5)
dropout = nn.Dropout(0.25)

print(dropout(embedding(x)))
"""
prints out something like
tensor([[[ 1.2195, -0.0000,  0.0000, -0.1915, -1.4923], #2 dims dropped out for 1st token
         [ 0.0000, -0.0000, -0.0000,  2.9155, -1.5389], #3 dims dropped out for 2nd token
         [-1.6213, -1.3348,  0.4382, -0.0000,  0.0000], #2 dims dropped out for 3rd token
         [-0.9286,  1.4961, -0.0000, -0.0000, -0.0000], #etc.
         [ 0.2175, -1.1769, -2.0785,  0.0000,  0.0033],
         [ 0.0000,  0.1375,  0.3578,  1.2870,  0.2853],
         [ 0.4859,  1.5290, -0.0000, -0.9282,  0.8301],
         [ 0.2195, -0.7541, -0.8038,  0.0000,  2.9087],
         [-0.0000,  0.0000,  0.2561, -0.0000,  0.3464],
         [-0.0000,  0.6796,  0.0638,  2.1675,  0.0000]]],
       grad_fn=<MulBackward0>)"""

I see now :), thanks a lot!!!