EleutherAI/gpt-neo

Mesh TensorFlow CPU Inference

pablogranolabar opened this issue · 2 comments

If your implementation is based on Mesh TensorFlow which natively supports CPU inference, why wouldn't a multi-CPU mesh work for GPT-Neo inference if enough memory is available per CPU node (say 10GB)?

It probably would, but we have had no need to use it and therefore no motivation to test or implement it. If you open a PR with this feature I'll review it.

Hi Stella,

My thoughts are that if this can be parallelized on CPU via Mesh TensorFlow, that GPT-Neo would be an ideal use case for low cost microservice endpoint inference which would be exponentially cheaper than GPU inference. I'll open a PR with the details.