OpenPPL/ppl.nn

pplnn run mobilenet v2 model failed. (use cuda)

shiwenloong opened this issue · 7 comments

What are the problems?(screenshots or detailed error messages)

pplnn run mobilenet v2 model failed(use cuda). mobilenet v2 model is exported from torchvision.

ppl.nn version: [0.9.0], commit: [2da19ac438d4f726b8744d650a1751d310fc0710-dirty]
[INFO][2022-12-04 17:42:46.453][pplnn.cc:308] ***** register CudaEngine *****
[INFO][2022-12-04 17:42:46.474][utils.cc:369] total partition(s) of graph[torch_jit]: 1.
[INFO][2022-12-04 17:42:46.478][opt_graph.cc:312] added 242 new bridge kernels
[INFO][2022-12-04 17:42:46.509][algo_conv_hmma.cc:141] Compiling /features/features.0/features.0.0/Conv
[INFO][2022-12-04 17:42:51.219][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x32_w32x16_k32_s16
[INFO][2022-12-04 17:42:51.239][algo_conv_hmma.cc:141] Compiling /features/features.1/conv/conv.1/Conv
[INFO][2022-12-04 17:42:55.559][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x16_w32x8_k64_s32
[INFO][2022-12-04 17:42:55.650][algo_conv_hmma.cc:141] Compiling /features/features.2/conv/conv.0/conv.0.0/Conv
[INFO][2022-12-04 17:42:58.170][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b128x32_w32x16_k32_s32
[INFO][2022-12-04 17:42:58.184][algo_conv_hmma.cc:141] Compiling /features/features.2/conv/conv.2/Conv
[INFO][2022-12-04 17:43:00.891][algo_conv_hmma.cc:146] select kernel nv2spkSm75Fp16Conv_hmma1688_nhwc_f1_b32x16_w16x16_k64_s32_buf2
[INFO][2022-12-04 17:43:00.921][algo_conv_hmma.cc:141] Compiling /features/features.3/conv/conv.0/conv.0.0/Conv
[INFO][2022-12-04 17:43:06.278][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x32_w32x8_k32_s32
[INFO][2022-12-04 17:43:06.289][algo_conv_hmma.cc:141] Compiling /features/features.3/conv/conv.2/Conv
[INFO][2022-12-04 17:43:06.524][algo_conv_hmma.cc:146] select kernel nv2spkSm75Fp16Conv_hmma1688_nhwc_f1_b64x8_w64x8_k128_s32_buf1
[INFO][2022-12-04 17:43:06.557][algo_conv_hmma.cc:141] Compiling /features/features.4/conv/conv.0/conv.0.0/Conv
[INFO][2022-12-04 17:43:12.012][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b128x32_w64x8_k32_s32
[INFO][2022-12-04 17:43:12.017][algo_conv_hmma.cc:141] Compiling /features/features.4/conv/conv.2/Conv
Segmentation fault (core dumped)

What are the types of GPU/CPU you are using?

RTX 2080 Ti

What's the operating system ppl.nn runs on?

Ubuntu 18.04

What's the compiler and its version?

g++ 7.5.0
nvcc V10.2.89

Which version(commit id or tag) of ppl.nn is used?

2da19ac-dirty

What are the commands used to build ppl.nn?

cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install
cmake --build . -j 20 --config Release
cmake --build . --target install -j 20 --config Release

What are the execution commands?

./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx --kernel-type=float16 --export-algo-file=algos/mobilenet_v2_fp16.json

minimal code snippets for reproducing these problems(if necessary)

import torch
import torchvision
model = torchvision.models.mobilenet_v2(torchvision.models.MobileNet_V2_Weights.DEFAULT)
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(
       model,
       dummy_input,
       "mobilenet_v2.onnx",
       input_names=["inp"],
       output_names=["out"],
       opset_version=11
)
./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx  --kernel-type=float16 --export-algo-file=algos/mobilenet_v2_fp16.json

models and inputs for reproducing these problems (send them to openppl.ai@hotmail.com if necessary)

Si-XU commented

Cause we donot have 2080Ti right now, we cannot reproduce your error locally. There are two ways to skip such error I used.

You can firstly test if it still get error by using default kernel.

./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx --kernel-type=float16 --quick-select

Or, you can rebuild ppl.nn without JIT, then test again:

cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install -DPPLNN_ENABLE_CUDA_JIT=OFF

--quick-select

Using --quick-select can fix the problem. But the profiling performance is poor. Running mobilenet v2 with ppl.nn using default kernel costs 0.78ms, while running with tensorrt costs 0.39ms.

Can I use kernel tuning on mobilenet v2?

Si-XU commented

If you consider about the perfermance, you can rebuild ppl.nn without JIT. I think this way can also fix the problem you met.

cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install -DPPLNN_ENABLE_CUDA_JIT=OFF

If you consider about the perfermance, you can rebuild ppl.nn without JIT. I think this way can also fix the problem you met.

cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install -DPPLNN_ENABLE_CUDA_JIT=OFF

Thanks. The performance of ppl without JIT is great.
Do ppl only support fp16 on 2080Ti? I tried to set the kernel type as fp32 but failed.

./build-no-jit/tools/pplnn  --use-cuda --onnx-model=resnet18.onnx --kernel-type=float32
[ERROR][2022-12-07 14:42:06.426][sequential_scheduler.cc:129] exec kernel[/maxpool/MaxPool] of type[:MaxPool:11] failed: unsupported
[ERROR][2022-12-07 14:42:06.426][runtime_impl.cc:337] Run() failed: unsupported
[ERROR][2022-12-07 14:42:06.426][pplnn.cc:1315] Run() failed: unsupported
Si-XU commented

We only support fp16/int8 kernels for conv right now. So, the runtime crashed before conv.

We only support fp16/int8 kernels for conv right now. So, the runtime crashed before conv.

OK. Thanks for your reply.

Si-XU commented

Do you mind sent your broken model to openppl.ai@hotmail.com