nicolas-ivanov/tf_seq2seq_chatbot

How long should I train in CPU?

guotong1988 opened this issue · 1 comments

global step 2200 learning rate 0.5000 step-time 2.00 perplexity 37.43
  eval: bucket 0 perplexity 20.10
  eval: bucket 1 perplexity 33.75
  eval: bucket 2 perplexity 34.75
  eval: bucket 3 perplexity 43.44

result:

how old are you ?
i ' m not .

@guotong1988, I would not even try training seq2seq models on a CPU, since a simple analysis shows that a typical GPU yields 40x-80x speed up comparing to a CPU. AWS has some affordable options, you can check them up here: https://aws.amazon.com/ec2/pricing/

In case your only option is still CPU, I would recommend using very modest parameters for your model, i.e. equal or lower than the following ones:

  • 1 lstm layer x 512 neurons
  • w2v embeddings dimensionality = 128
  • max length of input and output sequences = 16 (i.e. bucket sizes should not exceed this value)
  • max vocabulary size = 20000
  • batch size - as big as possible to fit in the RAM of your machine

This should significantly increase your chances of generating some meaningful answers on CPU in a realistic period of time. However, my main message remains the same - deep learning tastes much better when served with good GPUs.