How to Convert code to C++

Question

How to Convert code to C++

0yueyunfei0 opened this issue 4 years ago · 12 comments

Hi, I really like your code, your code works particularly well on our model, but now we need to deploy the model to TensorRT in C++, although a Pytorch model like pspnet can be converted very quickly, but how can a method like process_high_res_im be written with the api in TensorRT (C++), thanks a lot!

Answer 1 · 2021-03-19T03:28:54.000Z

Hi,

I don't have a lot of experience in TensorRT so my suggestion would probably be sub-optimal...
process_high_res_im is essentially just cropping/stitching -- I think these can be done in C++ (maybe OpenCV's GPU functions are sufficient) or with custom CUDA kernels. That means only converting the PSPNet part to be optimized by TensorRT while leaving the rest on vanilla C++/CUDA.

Answer 2 · 2021-03-19T03:39:50.000Z

Thank you for your reply, I will try these suggestions and report back to you with progress

Answer 3 · 2021-03-20T12:56:59.000Z

Hi,

I don't have a lot of experience in TensorRT so my suggestion would probably be sub-optimal...
process_high_res_im is essentially just cropping/stitching -- I think these can be done in C++ (maybe OpenCV's GPU functions are sufficient) or with custom CUDA kernels. That means only converting the PSPNet part to be optimized by TensorRT while leaving the rest on vanilla C++/CUDA.

I've serialized Pytorch->onnx->TensorRT C++, rewritten most of the cropping/stitching using opencv, and rewritten AdaptiveAvgPool2d to accommodate the onnx operations.
After two days of attempts, I think I ran into two tricky problems.
One is the dynamically changing number of inputs during the forward of RefinementModule, because inter_s8=None, inter_s4=None are indeterminate.Although I solved the dynamic shape in tensorrt, I don't know how to solve the dynamic number of inputs.
Then there is the problem in the combined_224[:,:,start_y:end_y, start_x:end_x] += [grid_pred_224[:,:,pred_sy:pred_ey,pred_sx:pred_ex
I found a problem when debugging this sentence, pred_sy:pred_ey is larger than start_y:end_y, Causes copying large arrays to small arrays.
How can the above two problems be solved in C++?
Many thanks again, I'm sorry to bother you, but this code is really important to us.

Answer 4 · 2021-03-20T15:39:15.000Z

If you are following the global/local procedure, then there are only two types of forwards -- one with (img, seg) used in the global step, and the other with (img, seg, inter_s8) used in the local step. Maybe you can split our forward function into two versions, such that each one of them has a fixed number of inputs? If that is inducing extra memory cost, note that model.feats and model.psp are called (maybe multiple times) in all kinds of forwards -- so maybe they can be shared.
That should not happen. The two array sizes should always be the same or else even the python code would not work.

Answer 5 · 2021-03-22T02:31:01.000Z

Almost Done!
After splitting into two Models, after confusing trial and error with NCHW and BGR between opencv and TensorRT, and after a day of Debug with TensorRT, we finally have initial success, thank you very much!

Answer 6 · 2021-03-22T06:56:30.000Z

If you are following the global/local procedure, then there are only two types of forwards -- one with (img, seg) used in the global step, and the other with (img, seg, inter_s8) used in the local step. Maybe you can split our forward function into two versions, such that each one of them has a fixed number of inputs? If that is inducing extra memory cost, note that model.feats and model.psp are called (maybe multiple times) in all kinds of forwards -- so maybe they can be shared.

That should not happen. The two array sizes should always be the same or else even the python code would not work.

Finally, I found that the problem is that reading pixels as floats from C++ opencv should be done with mat.at(i,j) instead of mat.data[i*width+j], the tensorRT sample was misleading me.
The strange thing is that my previous TRT project with the same wrong input method also output the correct result? Maybe the input range of the two models is different, for CascadePSP, it needs image value [-2,2] and for mask [-1,1].
Anyway, I am getting good results with image value range [0,1] and mask [-1,1].

Answer 7 · 2021-03-22T07:19:26.000Z

I wouldn't call this blurry result good...
You can compare with our python implementation to check if your implementation is correct.

Answer 8 · 2021-03-22T07:26:44.000Z

Sorry, this is only the first Global Step output, I will finish the Local Step as soon as possible and let you know the final result

Answer 9 · 2021-03-22T07:30:03.000Z

Ah, you don't have to say sorry -- I am not your boss. Chill.
It looks blurry even for just the global step.

Answer 10 · 2021-03-22T08:31:07.000Z

I'm pretty sure now that the range of values due to preprocessing of the image and mask can drastically affect the performance of the network. I mentioned above that I get a very blurry result when the value range of the image is [0,1], and I just need to regularize the range mean=0.45 std=0.225,which is the self.im_transform works, and I can get a very nice result in the first Global Step.
I should finish all the modules soon, and I'm eagerly awaiting the performance of your CascadePSP in TRT!

Answer 11 · 2021-03-22T12:09:43.000Z

After four days, almost all day, I think I managed to convert CascadePSP, the great Refiner, to TensorRT+OpenCV, all written in C++.
Thanks for your advice, and your great code. Thanks again!
In some days I may open source my TensorRT C++ implementation, just as soon as our competition is over.
$KRJ%CC3{E`CWXZ_DI7 N(J0$

Answer 12 · 2021-04-09T10:20:26.000Z

After four days, almost all day, I think I managed to convert CascadePSP, the great Refiner, to TensorRT+OpenCV, all written in C++.
Thanks for your advice, and your great code. Thanks again!
In some days I may open source my TensorRT C++ implementation, just as soon as our competition is over.
$KRJ%CC3{E`CWXZ_DI7 N(J0$

Hi ~~~ Would you mind if I ask about the time cost after your conversion from python to tensorRT C++ version?