using bert embeddings with lstm

Question

using bert embeddings with lstm

rajae-Bens opened this issue 4 years ago · 2 comments

Hi,

First of all I want to thank you for this amazing tutorials. They really helped me to understand a lot of things in using DL with NLP
I tried to use bert embedding with LSTM classifier for multi class classification (notebook: 6 - Transformers for Sentiment Analysis.ipynb)

but I get an error when trying to train the model
Here is the implementation of the LSTM classifier

class LSTMClassifier(nn.Module):
def init(self, batch_size, output_size, hidden_size, bert):
super(LSTMClassifier, self).init()

	self.bert = bert
	self.batch_size = batch_size
	self.output_size = output_size
	self.hidden_size = hidden_size
	
	embedding_dim = bert.config.to_dict()['hidden_size']
	self.lstm = nn.LSTM(embedding_dim, hidden_size)
	self.label = nn.Linear(hidden_size, output_size)
	
def forward(self, input_sentence, batch_size=None):

	with torch.no_grad():
		embedded = self.bert(input_sentence)[0]

	input = embedded.permute(1, 0, 2)

	if batch_size is None:
		h_0 = Variable(torch.zeros(1, self.batch_size, self.hidden_size).cuda()) # Initial hidden state of the LSTM
		c_0 = Variable(torch.zeros(1, self.batch_size, self.hidden_size).cuda()) # Initial cell state of the LSTM
	else:
		h_0 = Variable(torch.zeros(1, batch_size, self.hidden_size).cuda())
		c_0 = Variable(torch.zeros(1, batch_size, self.hidden_size).cuda())
	output, (final_hidden_state, final_cell_state) = self.lstm(input, (h_0, c_0))
	final_output = self.label(final_hidden_state[-1]) # final_hidden_state.size() = (1, batch_size, hidden_size) & final_output.size() = (batch_size, output_size)
	
	return final_output

I also change the binary_accuracy function with

def categorical_accuracy(preds, y):
"""
Returns accuracy per batch, i.e. if you get 8/10 right, this returns 0.8, NOT 8
"""
max_preds = preds.argmax(dim = 1, keepdim = True) # get the index of the max probability
correct = max_preds.squeeze(1).eq(y)
return correct.sum() / torch.cuda.FloatTensor([y.shape[0]])

and the error I get is
RuntimeError: Expected hidden[0] size (1, 20, 256), got (1, 32, 256)

Could u help please
thank u

Answer 1 · 2020-09-09T20:27:23.000Z

This will be because the number of examples in your dataset doesn't divide exactly by 32, therefore the final batch will have less than 32 elements in it, here it's 20.

When you create your h_0 and c_0, you use the self.batch_size which is always 32, but the actual batch size might be different. One solution is to get the batch size in the forward method from input_sentence.shape. The other is to not both declaring h_0 and c_0 at all as by default PyTorch will create an initial hidden and cell state of all zeros if none is provided.

Answer 2 · 2020-09-12T06:59:18.000Z

Yes, It worked thank s for ur help