The source code of paper on SC2022
ICE is an irregular collaborative serving engine that can enable efficient inference execution in cloud-edge continuum with two main modules. The model slicer adaptive slice DNN model into pieces, and the runtime serving engine enable multi-entrance multi-exit inference to support irregular serving of model slices on the datacenter side.
-
Hardware&software requirements
-
Hardware Requirements
- CPU: Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz
- Memroy: 252G
- NVIDIA RTX 2080 Ti GPU
-
Software Requirements
- Ubuntu 18.04
- Docker 19.03
- GPU Driver: 450.51
- CUDA 10.1
- CUDNN 7.6
- Miniconda3-py37_4.9.2
- Pytorch 1.3.0
-
- Download and run the provided runtime backend with docker.
$ docker pull midway2018/ice_runtime
$ docker run -it --gpus=all --ipc=host -it midway2018/ice_runtime /bin/bash
$ git clone https://github.com/fkh12345/ICE.git
- Activate conda and create python environment with essential dependencies
$ conda activate slice
$ cd ICE
$ pip install -r requirement.txt
$ # Switch to backend without batching policy
$ conda activate no-batch
$ pip install -r requirement.txt
The running progress of the ICE runtime is seperated by two steps: 1) Start the inference server with server.py
script. And 2) Run the emulation threads using the client1.py
script.
$ conda activate slice
$ python server.py --bs <DNN_batchsize> --method ICE --progress true
$ # A new terminal
$ python client1.py --bs <num_of_queries> --load <high/medium/low> --slice true