Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

Question

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

Closed this issue 5 years ago · 1 comments

https://www.usenix.org/conference/opml19/presentation/zhang-minjia

Answer 1 · 2019-06-24T13:32:22.000Z

Issues:

First, users expect to receive an inference result with low latency.
Second, when the volume of requests exceeds the capacity of a single server, the DL service must scale
horizontally.
Finally, these constraints come together with restriction on the deployment infrastructure.

Our Design:

To adopt a co-development methodology called SLT (scenario, library, and technique) to make the best use of the CPU resource for business critical scenarios while accelerating the iteration cycle of deployment and optimization.