torch/rnn

Current torch/rnn breaks older Element-Research based code

tastyminerals opened this issue · 4 comments

Seriously guys.

Looks like current torch/rnn introduces changes compared to Element-Research/rnn and is not compatible with the older code. Once Torch users update/reinstall/install the framework and attempt to run code that uses rnn it will crash and they will flood the issue tracker with misc error messages depending on their model implementations which will all be related to the same issue. What's even worse, rebuilding a docker image now pulls all the new stuff and crashes in production because the older models are not compatible.

I am using a modified GRU.lua layer in 6 of my models and now all of them stopped working after the update. And there is still no changelog no tips on how to fix issues with GRU_mod.lua expecting nn.Module instance at arg 1. What is the status of stepmodule which is now equally used in recurrrent language model ("recurrent-language-model.lua") as nn.Sequential container and then inside something called RecGRU.lua that was simple GRU layer before? Why did you decide to replace a simple LSTM, GRU layers with recurrent versions and deprecating the former in process? Sigh.

The latest update is a good way to force ppl start using pytorch.

You can install the original Element-Research/rnn by git cloning it and running luarocks make rocks/[TAB]. Ah but you already know that. I am sorry about this. I tried to have Element transfer their rnn repo to torch/rnn but they didn't like that. This is why we have two rnn repos now.

As for the changelog, good point. Basically, we made everything faster by moving LSTM/GRU to C/CUDA in RecGRU, RecLSTM and SeqGRU and SeqLSTM. We deprecated the slow code (GRU, LSTM, Recurrent, etc.). We also deprecated LinearNoBias as this feature is now supported in the base Linear via the nobias() method.

Are there any benchmarks of how faster it is now? Because if it is 15% faster I'd rather stay on older codebase :)

I should always read docs!

"A 1.9x speedup is obtained by computing every time-step using a single specialized module instead of using multiple basic nn modules. This also makes the model about 13.7 times more memory efficient."