How long should I train in CPU?
guotong1988 opened this issue · 1 comments
guotong1988 commented
global step 2200 learning rate 0.5000 step-time 2.00 perplexity 37.43
eval: bucket 0 perplexity 20.10
eval: bucket 1 perplexity 33.75
eval: bucket 2 perplexity 34.75
eval: bucket 3 perplexity 43.44
result:
how old are you ?
i ' m not .
nicolas-ivanov commented
@guotong1988, I would not even try training seq2seq models on a CPU, since a simple analysis shows that a typical GPU yields 40x-80x speed up comparing to a CPU. AWS has some affordable options, you can check them up here: https://aws.amazon.com/ec2/pricing/
In case your only option is still CPU, I would recommend using very modest parameters for your model, i.e. equal or lower than the following ones:
- 1 lstm layer x 512 neurons
- w2v embeddings dimensionality = 128
- max length of input and output sequences = 16 (i.e. bucket sizes should not exceed this value)
- max vocabulary size = 20000
- batch size - as big as possible to fit in the RAM of your machine
This should significantly increase your chances of generating some meaningful answers on CPU in a realistic period of time. However, my main message remains the same - deep learning tastes much better when served with good GPUs.