Ventu already implement this protocol, so it can be used as the worker for deep learning inference.
This project is just a proof of concept. Check the MOSEC for production usage.
- dynamic batching with batch size and latency
- invalid request won't affects others in the same batch
- communicate with workers through Unix domain socket or TCP
- load balancing
If you are interested in the design, check my blog Deep Learning Serving Framework.
go run service/app.go --help
Usage app:
-address string
socket file or host:port (default "batch.socket")
-batch int
max batch size (default 32)
-capacity int
max jobs in the queue (default 1024)
-host string
host address (default "0.0.0.0")
-latency int
max latency (millisecond) (default 10)
-port int
service port (default 8080)
-protocol string
unix or tcp (default "unix")
-timeout int
timeout for a job (millisecond) (default 5000)
go run service/app.go
python examples/app.py
python examples/client.py