This document describes the process around creating, deploying, and managing Pytorch-based and non-Pytorch-based models with Pytorchserving. This is followed by the instructions specific to each Pytorchserving based model in this repository.
Each model is bundled into a .mar
(model archive) file that contains a
handler (handler.py
), a model checkpoint (checkpoint.pth
), additional files (packaged as .zip
, .tar
, .tar.gz
etc.), requirements file (requirements.txt
) etc and the models code.
A .mar
file is like a regular zip file (tip: Run unzip model.mar
and see the contents of the unzipped folder).
To build the .mar
file we use the torch-model-archiver tool. Basic usage of this is as follows:
torch-model-archiver --model-name <model-name> --version <model_version_number> --handler model_handler
[:<entry_point_function_name>] [--model-file <path_to_model_architecture_file>] --serialized-file
<path_to_state_dict_file> [--extra-files <comma_seperarted_additional_files>] [--export-path <output-dir>
--model-path <model_dir>] [--runtime python3]
When the model is loaded into the memory by pytorchserving, it first creates a temporary directory (e.g. /tmp/f3bab7bfaab7bfce
) and copies the
contents of the .mar
file into the temporary directory.
Pytorchserving adds this directory to PYTHONPATH
(therefore, statements like from handler import handle
will work).
Next, Pytorchserving installs the requirements from the given requirements.txt
file. Note that the requirements file is optional, and a separate
environment is created for each model with the given requirements. The temporary directory serves as the install location of the packages. In this manner, additional dependencies can be shipped along with the .mar
file.
Once the requirements are installed, pytorchserving attempts to load model by using the source code from the handler
file. Either by using a handler class or by a handler function.
Please refer to the official pytorch documentation for information on how to build a custom handler (either class based or function based) and package it into a .mar
file.
model-servers
defines a custom model handler for deploying PyTorch models using TorchServe. The handler contains functions and classes for model initialization, preprocessing, inference, and postprocessing that can be switched out by for custom models and used for creating a .mar
without much of the overhead required to package your model. We also provide helper scripts that can be used to package your model by simply refactoring your model source code.
The main components to package any pytorch model is as follows:
- An initialisation function: This function should return a torch.nn.Module object that represents your model object.
- Model checkpoint: This is your models
state_dict
that contains the models weights. Refer here for what astate_dict
is and how to save your model weights. - A requirements file: This file contains additional python modules that you model requires. This is usually in the form of a
requirements.txt
file containing all the pakcages your model uses (Excluding torch/torchvision/torchaudio).
Once these three components are provided, we can swap out the appropriate modules in our custom handler with those for your model and package it into a .mar
.
The model-servers
handler takes care of certain things that users generally have to handle themselve by creating a custom handler. These include:
- Writing custom handlers for any new model that need to be packaged.
- Device assignment and management.
- Limiting memory for a particular model
- Server side request batching
We standardise certain aspects of torchserve handlers and make them reusable without the need to create a custom handler and add them separately for each new model.
To be able to use the model-servers
handler you must refactor your code into a certain format defined below. We recommend creating a separate file where you can import your model code and create the functions needed by the handler. This is useful to make changes to your model without the need to modify the handler.
The init_function
should be a function that initializes your PyTorch model. This function will be called during model loading. It should return an instance of your PyTorch model (an instance of torch.nn.Module
). The handler allows you to pass additional arguments and keyword arguments to this function.
Let us suppose we define our model in a file called model.py
(this file along with any other source code files are generally passed in the extra-files
parameter to the .mar
):
import torch
class MyNetworkBlock(torch.nn.Module):
def __init__(self, in_dim, out_dim):
super().__init__()
self.lin = torch.nn.Linear(in_dim, out_dim)
def forward(self, x):
x = self.lin(x)
x = torch.relu(x)
return x
class MyNetwork(torch.nn.Module):
def __init__(self, in_dim, layer_dims):
super().__init__()
prev_dim = in_dim
for i, dim in enumerate(layer_dims):
setattr(self, f'layer{i}', MyNetworkBlock(prev_dim, dim))
prev_dim = dim
self.num_layers = len(layer_dims)
self.output_proj = torch.nn.Linear(layer_dims[-1], 10)
def forward(self, x):
for i in range(self.num_layers):
x = getattr(self, f'layer{i}')(x)
return self.output_proj(x)
Our init_function is declared in a separate file, for example, handler_model.py
as follows:
from model import MyNetwork
def init_function(in_dim, layer_dims):
model = MyNetwork(in_dim, layer_dims)
return model
NOTE: The init_function
can have any number of arbitrary parameters passed to it. These are passed as keyword arguments to the init_function
by the handler. It should also be noted that the model is loaded without pre-trained weights. These waits are loaded automatically by the handler using the model checkpoint passed to it as described earlier.
We provide functionality to define pre-processing and post-processing functions that may be needed to modify the inputs to the model and the outputs from the model. These are simple python functions that need to be designed to process one sample of data (Since batching is handled by the handler on the server side). These are optional and can be left undefined, in which case the input data sent to the server is passed to the model as is. There are two functions that can be defined:
-
Preprocessing Function (preprocess_function, optional): This function, if provided, preprocesses the input data before inference. It takes a single input and should return the preprocessed data.
-
Postprocessing Function (postprocess_function, optional): This function, if provided, post-processes the model's output after inference. It takes a single input and should return the postprocessed data.
Here is a sample of a model code that uses both pre-processing and post-processing.
from torchvision.models import resnet18, resnet50, ResNet18_Weights, ResNet50_Weights
def init_function(resnet_layers=18):
# Torchvision models are instances of torch.nn.Module
if resnet_layers == 18:
resnet = resnet18(weights=None)
elif resnet_layers == 50:
resnet = resnet50(weights=None)
return resnet
def preprocess(data):
transforms = ResNet18_Weights.DEFAULT.transforms()
return transforms(data)
def postprocess(data):
prediction = data.squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = ResNet18_Weights.DEFAULT.meta["categories"][class_id]
return score, class_id, category_name
In addition to these, two other functions are available:
-
Install Packages Function (install_packages, optional): If your model requires any external packages or dependencies that need to be installed, you can define this function to handle their installation.
-
Unpack Dependencies Function (unpack_dependencies, optional): If your model requires any additional files or dependencies that need to be unpacked or prepared, you can define this function to handle the unpacking process.
An example of these is given below:
import os
# Alternatively you can use subprocess
def unpack_dependencies():
os.system("wget https://someserver.com/somefile.zip")
os.system("unzip somefile.zip")
def install_packages():
os.system("git clone https://github.com/someproject/project.git")
os.system("project/build.sh")
NOTE: unpack_dependencies
need not be used for model source code files. These can be compressed into a .tar.gz
file and passed to the handler and will be uncompressed by the handler. Any pip
packages also need to be installed in install_packages
, they can be added to the requirements.txt
file and will be installed by torchserve before loading the model.
Once all the functions have been defined, they can be added to a file and a handler class can be created using base_handler
. [base/base_handler.py
]
import os
from torchvision.models import resnet18, resnet50, ResNet18_Weights, ResNet50_Weights
from base_handler import handler_factory
def init_function(resnet_layers=18):
# Function should return a torch.nn.module
if resnet_layers == 18:
resnet = resnet18(weights=None)
elif resnet_layers == 50:
resnet = resnet50(weights=None)
return resnet
def preprocess(data):
transforms = ResNet18_Weights.DEFAULT.transforms()
return transforms(data)
def postprocess(data):
prediction = data.squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = ResNet18_Weights.DEFAULT.meta["categories"][class_id]
return score, class_id, category_name
def unpack_dependencies():
os.system("wget https://someserver.com/somefile.zip")
os.system("unzip somefile.zip")
def install_packages():
os.system("git clone https://github.com/someproject/project.git")
os.system("project/build.sh")
handler = handler_factory(init_function=init_function,
preprocess_function=preprocess,
postprocess_function=postprocess,
install_packages=install_packages,
unpack_dependencies=unpack_dependencies,
resnet_layers=18)
NOTE: Notice how extra parameters needed to initialise the model are passed as keyword arguments to the handler_factory
. This is limited for now to the init_function
, we will later add options to pass arguments to all the other functions as well.
Once this model file is ready, we can use the build script [base/build.sh
] to build the .mar
using this model file.
The usage of the build script is as follows:
./build.sh --model_name <model_name> --model_file <model_file> --config_file
<config_file> --requirements_file <requirements_file> --checkpoint_file <checkpoint_file> --version <version> [--extra_files <extra_files>]"
The parameters are defined as follows:
model_name
: This is a unique name given to the model. TorchServe hosts the model at an endpoint based on this name. (https://torchservehost/models/<model_name>
)model_file
: This is the model file we defined in the previous section that contains theinit_function
,preprocess_function
, etc. and the handler instance.config_file
: This is model-config file of the model. This file can be used to define extra model parameters. This is the model-config defined by TorchServe We will define extra parameters that can be passed via this config file to the base_handler to control certain aspects of your model in the next section.requirements_file
: This is therequirements.txt
file that list all the python packages needed by your model. These are installed before your model is loaded.checkpoint_file
: This is your model's savedstate_dict
, usually a.pt
or.pth
file.version
: This defines your models version, usually1.0
,2.0
... etc. TorchServe uses this version to differentiate between different instances of the same model. (https://torchservehost/models/<model_name>/<version>
)extra_files
: This is any extra source code files that yourmodel_file
needs compressed into a.tar.gz
file. These are unpacked in the same folder asmodel_file
maintaining the same directory structure they were packed with. This parameter also needs to be added to the model-config as defined in the next section. Please usetar -cvf <extra_files> -C file1 file2 directory/
to compress your files.
We explain the model-config and its parameters below. This file is defined by TorchServe link We add a few extra parameters to control things like model memory limit, and the source code files. Here is a sample model-config
# TorchServe controls. These parameters are used to control model workers, batching, etc. These parameters are optional, if not provided TorchServe will use default values.
minWorkers: 1 # Minimumum number of workers needed for the model
maxWorkers: 1 # Maximum number of workeres needed for the model, this is the upper limit that TorchServe will scale your model.
maxBatchDelay: 4000 # This is the maximum delay between requests that TorchServe uses to batch requests(in msec). Any requests received within this time limit will be batched upto the batchSize.
responseTimeout: 5200 # This is the maximum delay before TorchServe returns a failed reponse. Please set this appropriately
batchSize: 4 # The batchsize of the model.
deviceType: "gpu" # This is the device that the model needs to be placed on. [cpu, gpu, neuron]. In case of multiple GPUs on the system, the device is assigned automatically by TorchServe.
deviceIds: [0,1,2,3] # gpu device ids allocated to this model.
# These are extra parameters that we define for our model-servers handler.
system:
extraFiles: "test.tar.gz" # This is the extra source code files, adding them here tells the handler what file to unpack.
maxGPUMemory: 500 # Max GPU Memory needed by model in MB. In case this is set, the model will not be able to access more than the limit.