Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft
Closed this issue · 1 comments
HuaizhengZhang commented
Issues:
First, users expect to receive an inference result with low latency.
Second, when the volume of requests exceeds the capacity of a single server, the DL service must scale
horizontally.
Finally, these constraints come together with restriction on the deployment infrastructure.
Our Design:
To adopt a co-development methodology called SLT (scenario, library, and technique) to make the best use of the CPU resource for business critical scenarios while accelerating the iteration cycle of deployment and optimization.