Mesh TensorFlow CPU Inference
pablogranolabar opened this issue · 2 comments
pablogranolabar commented
If your implementation is based on Mesh TensorFlow which natively supports CPU inference, why wouldn't a multi-CPU mesh work for GPT-Neo inference if enough memory is available per CPU node (say 10GB)?
StellaAthena commented
It probably would, but we have had no need to use it and therefore no motivation to test or implement it. If you open a PR with this feature I'll review it.
pablogranolabar commented
Hi Stella,
My thoughts are that if this can be parallelized on CPU via Mesh TensorFlow, that GPT-Neo would be an ideal use case for low cost microservice endpoint inference which would be exponentially cheaper than GPU inference. I'll open a PR with the details.