OpenNMT/OpenNMT-py

[Educational purpose] Why OpenNMT-py is fast?

howardyclo opened this issue · 6 comments

Hello, recently I implemented seq2seq for practicing and educational purpose.
Here is my code.

I also compared the performance to OpenNMT-py, and found that this library is more
GPU-memory efficient and the training iteration is a lot fast. When running the following model:

  • word_vec_size=300
  • hidden_size=512
  • rnn_type=LSTM
  • batch_size=32
    and trained on my grammatical error correction corpus (2443191 sentence pairs), OpenNMT-py only takes ~1 hour to complete an epoch (~76000 iterations), while my code takes ~6 hour to complete an epoch.

I am wondering what important optimizations should I further do comparing to the OpenNMT-py codebase? Since when I tried OpenNMT-py, I didn't specify shard_size and couldn't know why OpenNMT-py is fast? What key script should I be aware to?

Appreciated.

Compare with different framework in github.
OpenNMT-py is fast, GPU-memory efficient and performance good.
I have not studied a lot about OpenNMT-py source code.

Maybe only the engineers who build it can explain the trick and reason.

Call @srush

srush commented

I was called :D

So our main aim is simplicity not speed. That being said there are a couple optimizations that matter:

  • Use CuDNN when possible (always on encoder, on decoder when input_feed 0)
  • Always avoid indexing / loops and use torch primitives.
  • When possible, batch softmax operations across time. ( this is the second complicated part of the code)
  • Batch inference and beam search for translation (this is the most complicated part of the code)

Awesome to hear you are working on GEC, it's a neat problem.

Cheers!

@srush Thanks for the reply! It's helpful!
I think I should study opennmt-py codebase in order to write more optimized code, because the training speed and memory usage really matters a lot. Recently I've come up with an idea about a different way to train GEC task and require to craft a new model. This new model is basically a seq2seq but with a dynamic memory spirit in it. But before that, I really need to have an efficient codebase first :-(

@howardyclo can I ask what is GEC?

srush commented

Grammatical Error Correction.

Feel free to just use our code. It is pretty modular, and it can be more fun to develop with others.

We will also likely add some GEC specific features as well. One of our students works on that.

@srush
After I dig into the onmt codebase, I found the key point to speed up the training is "bucketing". When I trained on my own GEC corpus (one epoch = 2 million sentence pair), it costed me 7 hours to complete an epoch. When I replaced my own dataloader to onmt's data iterator, I found that the training time is reduced to 1.5 hour! That's quite mind-blowing! :-) Besides, the performance is improved too.