/pytorch-rwa

RWA in pytorch

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

pytorch-rwa

This project is meant to be a pytorch implementation of the paper here

So far the Add task works! I've also added a decay term that may need to be modified but still seems to work nevertheless.

I've also added a new cell that looks like a CGRU cell from the Neural GPU but differs slightly in that I use groups. Take a look!

TODO:

  • Train the network on all tasks in the original repo
    • Implemented AddTask
  • Use this in projects!

Notes:

  1. The new CGRURWACell is interesting because even if we process one whole sequence at a time it will pass information from sequence to sequence

    1.1. If we process only one step of a sequence at a time then the hidden state acts like it would in a normal RNN

    1.2. Do we want to even keep the Variables constituted for the hidden states and backprop across sequences? This might work for some small time but I'm worried that it'll just get to be a chain that's way too long...