MIC-DKFZ/nnDetection

[Question] Installation without docker and model deployment.

ArpanGyawali opened this issue · 19 comments

❓ Question

I am sorry for back to back question. But this is very important for me.
I previously used retinaNet for detection and i used 2d data but now i have shifted to 3d data.
I am using a remote machine that has large shared resources that doesnt allow the use of docker. I want to use nndetection for my LVO detection task. Is there any other way to install this that allows preprocessing unpacking training nferencing and deployment everything?

Also after my model is built on custom dataset, i want to deploy the model for inference, is that possible?

There is a dedicated section in the readme of the repository for the installation of nnDetection without Docker. There are some requirements for other tools which need to be installed. It should be possible to install CUDA via Conda by now but we haven't explored that direction for now.

What kind of deployment are you thinking about? For deployment I would usually recommend a Docker container to avoid all the CUDA requirements.

Thank you. I didnt see it earlier.
yes do i need to convert the model to torchscript and use docker to deploy it. What inference script should i use so that i could use the deployed model to run on new data from any device?

I don't think torchscript will run out of the box, since nnDetection was intended as a research platform rather than something that can be easily deployed. You might need to adapt the models to make torchscript work nicely with the code.

nnDetection provide an entrypoint to run inference on test data.

Yes there is an entrypoint, but for that the end user need to install nndetection right?
What do you mean by adapt the model?

If you package it in a docker container, the user only needs a (nvidia-)docker installation and a compatible GPU driver.

AFAIK torchscript does not support all operations and thus changes might need to be performed to make it compatible with torchscript.

@mibaumgartner that means the inference cannot be performed on the device with no gpu support?
gpu is mandatory for inference as well or is there is a way inference/prediction can be done on new data using cpu?

GPU is mandatory for inference for now

@mibaumgartner When Will the inferencing on CPU be available on later release?

I think you can already technically run nnDetection on CPU, but I wouldn't recommend it for now (at least not in the default configuration) since the inference time will be very high. Also I would consider this more of an experimental use case.

The primary focus of nnDetection is currently quite research focussed and we haven't thought of a alterantive deployment strategies than Docker + GPU.

I think you can already technically run nnDetection on CPU, but I wouldn't recommend it for now (at least not in the default configuration) since the inference time will be very high. Also I would consider this more of an experimental use case.

The primary focus of nnDetection is currently quite research focussed and we haven't thought of a alterantive deployment strategies than Docker + GPU.

Hello,

Just to try and see the inference time in CPU, can you help me about how to do please? Is there a parameter somewhere?

Best regards

[Please note this is highly experimental and not recommended!]
You need to change

device: torch_device = "cuda:0",

which can be passed here

https://github.com/MIC-DKFZ/nnDetection/blob/d41b5c0d64b6c7c85ca238373dd8d53121aa194d/scripts/predict.py#L101C28-L101C44 .

So in total you should be able to run inference on CPU by using nndet_predict [...] -o +inference_kwargs.device='cpu'

[Please note this is highly experimental and not recommended!] You need to change

device: torch_device = "cuda:0",

which can be passed here

https://github.com/MIC-DKFZ/nnDetection/blob/d41b5c0d64b6c7c85ca238373dd8d53121aa194d/scripts/predict.py#L101C28-L101C44 .

So in total you should be able to run inference on CPU by using nndet_predict [...] -o +inference_kwargs.device='cpu'

Thank you for your quick answer. This is more or less what I did just before I see your comment but hardcoding the class Predictor() al line 53 of this file with "cpu", and rebuilding a Docker image. A bit dirtier but doing the same thing I guess.

While in GPU mode prediction speed is at ~0.22 second per iteration/patch (37/128 [00:10<00:20, 4.52it/s]), CPU mode speed is ~26 seconds per iteration/patch (1/128 [00:25<54:56, 25.96s/it]). This leads to a total time per case from 1 minute (GPU) to 1 hour (CPU). Does this behavior is expected? Seems very long to me, no?

Have you hints in mind for deployment strategies that are relatively simple? I already did something in C++ with TensorRT and OpenVino for nnUNet. Maybe extending the scope of this to nnDetection would be pertinent (hard)works if no simpler alternative?

Thank you

To my knowledge, GPUs are significantly quicker (Factor 100-200) than CPU, to the results seem somewhat reasonable. There are many different factors influencing the total processing time like patch size, batch size, CPU cores etc.
Since we always used GPUs for inference, I don't have any reference values unfortunately.

I would suspect, that support for TensorRT etc. is more difficult for RetinaU-Net since it is not a straight forward Neural Network from Pytorch where the output of the network is the final prediction but includes additional postprocessing steps which might be more complex for such tools (at least Torchvision includes several functions to allow for proper tracing). Since the main goal of nnDetection is to allow for quick turnaround times for research, those functions are currently not included and we do not have tests in place to ensure proper tracing of the model.

À ma connaissance, les GPU sont nettement plus rapides (facteur 100-200) que les CPU, les résultats semblent quelque peu raisonnables. De nombreux facteurs différents influencent le temps de traitement total comme la taille des patchs, la taille des lots, les cœurs de processeur, etc. Comme nous avons toujours utilisé des GPU pour l’inférence, je n’ai malheureusement pas de valeurs de référence.

Je soupçonne que la prise en charge de TensorRT, etc. est plus difficile pour RetinaU-Net car il ne s’agit pas d’un réseau neuronal simple de Pytorch où la sortie du réseau est la prédiction finale, mais comprend des étapes de post-traitement supplémentaires qui pourraient être plus complexes pour de tels outils (au moins Torchvision inclut plusieurs fonctions pour permettre un traçage correct). Étant donné que l’objectif principal de nnDetection est de permettre des délais d’exécution rapides pour la recherche, ces fonctions ne sont actuellement pas incluses et nous n’avons pas de tests en place pour garantir un traçage correct du modèle.

Thank you for confirming my results with native nnDetection are ok. As for TensorRT etc., we will see if we do it, indeed it seems a bit harder, but for nnUNet it was a very significant improvment in inference time (x5 to x10 speed).

Have a good day !

To my knowledge, GPUs are significantly quicker (Factor 100-200) than CPU, to the results seem somewhat reasonable. There are many different factors influencing the total processing time like patch size, batch size, CPU cores etc. Since we always used GPUs for inference, I don't have any reference values unfortunately.

I would suspect, that support for TensorRT etc. is more difficult for RetinaU-Net since it is not a straight forward Neural Network from Pytorch where the output of the network is the final prediction but includes additional postprocessing steps which might be more complex for such tools (at least Torchvision includes several functions to allow for proper tracing). Since the main goal of nnDetection is to allow for quick turnaround times for research, those functions are currently not included and we do not have tests in place to ensure proper tracing of the model.

Hello, I decided to proceed in two steps: first produce a docker with a bare minimum of modifications to have something, then try the tricky things with TensorRT.

For the first step using Docker,

I think you can already technically run nnDetection on CPU, but I wouldn't recommend it for now (at least not in the default configuration) since the inference time will be very high. Also I would consider this more of an experimental use case.

The primary focus of nnDetection is currently quite research focussed and we haven't thought of a alterantive deployment strategies than Docker + GPU.

Building the docker as explained on the main GitHub page here, I end up with a +15GB image. Is this large size normal or have I missed something? I'd only like to use this docker for inference, so I guess a lot of dependencies are unnecessary for me here (eg, training-related things)? Do you have any tips on how to reduce the image size please?

Thank you very much for your help and answers to all my questions!

Hi @Thibescobar ,

we use the NVIDIA NGC container ('nvcr.io/nvidia/pytorch:21.11-py3') which is already really big (~12.5GB). You could try to reduce the size by moving to slimmer base container and install your own dependencies, e.g. it might be worth trying an NVIDIA Cuda Docker base image (or if only CPU is needed something even slimmer might be possible).
The NVIDIA NGC container already includes quite a lot of addiitonal things like TensorRT.

Best,
Michael

Thank you very much

This issue is stale because it has been open for 30 days with no activity.

This issue was closed because it has been inactive for 14 days since being marked as stale.