LAC deployment speed investigation
rickstaa opened this issue · 1 comments
Let's investigate how fast our model is when we deploy it. Let's look at two different models, a normal LAC and LAC with a CNN input layer.
MLP forward speed
Let's first check the forward pass speed of the normal LAC agent.
CPU: 0.97 ms = 1030 HZ
GPU: 0.87 ms = 1150 HZ
CNN Forward speed
Let's put a small convolutional neural network in front of the normal LAC model. Let's now perform 1e2
forward passes to get the average. forward pass the time. Let's do this both on the GPU and CPU. For this let's take an input image of size 3x128x128 (I think this is still quite small as for example the GQCNN grasping network uses 3x512x424). Let's make the CNN output layer 18 neurons big:
CPU: 812 ms = 1.21 Hz
GPU: 91.83 ms = 10.98 HZ
If we make the CNN output layer 10 neurons big we get:
CPU: 812 ms = 2 Hz
GPU: 69.59 ms = 14.5 HZ
Conclusion
From the above results the size of the image and the CNN output layer size really affects our forward pass execution speed. The control frequency therefore also highly depends on how big of an image we need for a good representation of the world. We can do the following to speed up the forward pass speed:
- Check how fast the performance is on the google coral TPU. This body part segmentation algorithm achieves 30fps on the google coral TPU with an image size of 1280x720. So there might be ways to speed this up. The coral accelerator however only works with tensorboard. We can use the Jetson Nano TPU for PyTorch.
- We can use grayscale images this way we reduce the image size by 3.
- We can decrease the CNN output size.
- We can use torch script to use the compiled version of the network. This, however, is not yet available in the current version of PyTorch as the distributions package which is used in the Gaussian actor is not yet TorchScript compatible (see this issue).
- Switch to tensorflow lite. I do not know if the performance will be better.
Closing for now.