pytorch-rwa

This project is meant to be a pytorch implementation of the paper here

So far the Add task works! I've also added a decay term that may need to be modified but still seems to work nevertheless.

I've also added a new cell that looks like a CGRU cell from the Neural GPU but differs slightly in that I use groups. Take a look!

TODO:

Notes:

The new CGRURWACell is interesting because even if we process one whole sequence at a time it will pass information from sequence to sequence

1.1. If we process only one step of a sequence at a time then the hidden state acts like it would in a normal RNN

1.2. Do we want to even keep the Variables constituted for the hidden states and backprop across sequences? This might work for some small time but I'm worried that it'll just get to be a chain that's way too long...

bzcheeseman/pytorch-rwa