Deep network a la Covington paper
aimran opened this issue ยท 3 comments
Recently came across this project. Massively impressed by the clean and intuitive API ๐
I started with the basic Pooling
method since it seemed to be the simplest. Then I noticed the Covington reference -- the paper seems to imply a deeper network whereas the PoolNet
uses a single layer. Am I completely misreading it (feel free to tell me to shut up :-D )
Wondering if you experimented with adding additional layers? Would be happy to dig into it otherwise.
Best
Asif
You can definitely try stacking more layers on top. You can do this by writing your own version of the pooling layer, then passing it as the representation
argument of the model constructor.
Assuming you are using the sequential models, I would strongly recommend building on top of the LSTM-based representation: it gets much better results.
Thanks for the tip. And I am using the seq models as you'd guessed. I will start with LSTM ๐
I guess what tripped me up with Pooling
was that the target
and input
were sharing the same item_embeddings
-- this may(?) not necessarily be true if one were to add layers. Covington et al. seems to suggest that the penultimate layer becomes pseudo-embeddings of sort.
As you say, the default implementations share the embeddings for the input and output layers. You can change that with your own representation layer. Whether or not the input and output embeddings are shared is unrelated to the number of layers; you can have a deep model without tying the embeddings. It may be worth trying both and seeing which one works better!