Replicating the work by Deepmind - Attentive Reader

Question

Replicating the work by Deepmind - Attentive Reader

dandxy89 opened this issue 8 years ago · 0 comments

Language:

Python (Keras w/ Theano backend)

Model

The Attentive Reader as described in the paper utilizes an attention mechanism inspired by recent results in translation and image recognition.

The attention mechanism that we have employed is just one instantiation of a very general idea
which can be further exploited. However, the incorporation of world knowledge and multi-document
queries will also require the development of attention and embedding mechanisms whose complex-
ity to query does not scale linearly with the data set size. There are still many queries requiring
complex inference and long range reference resolution that our models are not yet able to answer.
As such our data provides a scalable challenge that should support NLP research into the future. Further, significantly bigger training data sets can be acquired using the techniques we have described, undoubtedly allowing us to train more expressive and accurate models.

Info:

Currently training the model via CPU - I intend to move the processing over to a GPU once I have can see that performance is sufficiently worthwhile.

Performance (Accuracy %):

Epoch	Validation	Test
1	0.251	0.219
2	0.807	0.768
3	1.363	1.316
4	1.919	1.864
5	2.475	2.412
6	3.603	3.556
7	4.731	4.699

Once I have achieved > 10% I intend to move the model over too an AWS instance.