HuaizhengZhang/AI-System-School

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

Closed this issue · 1 comments

Issues:

First, users expect to receive an inference result with low latency.
Second, when the volume of requests exceeds the capacity of a single server, the DL service must scale
horizontally.
Finally, these constraints come together with restriction on the deployment infrastructure.

Our Design:

To adopt a co-development methodology called SLT (scenario, library, and technique) to make the best use of the CPU resource for business critical scenarios while accelerating the iteration cycle of deployment and optimization.