pplnn run mobilenet v2 model failed. (use cuda)
shiwenloong opened this issue · 7 comments
What are the problems?(screenshots or detailed error messages)
pplnn run mobilenet v2 model failed(use cuda). mobilenet v2 model is exported from torchvision.
ppl.nn version: [0.9.0], commit: [2da19ac438d4f726b8744d650a1751d310fc0710-dirty]
[INFO][2022-12-04 17:42:46.453][pplnn.cc:308] ***** register CudaEngine *****
[INFO][2022-12-04 17:42:46.474][utils.cc:369] total partition(s) of graph[torch_jit]: 1.
[INFO][2022-12-04 17:42:46.478][opt_graph.cc:312] added 242 new bridge kernels
[INFO][2022-12-04 17:42:46.509][algo_conv_hmma.cc:141] Compiling /features/features.0/features.0.0/Conv
[INFO][2022-12-04 17:42:51.219][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x32_w32x16_k32_s16
[INFO][2022-12-04 17:42:51.239][algo_conv_hmma.cc:141] Compiling /features/features.1/conv/conv.1/Conv
[INFO][2022-12-04 17:42:55.559][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x16_w32x8_k64_s32
[INFO][2022-12-04 17:42:55.650][algo_conv_hmma.cc:141] Compiling /features/features.2/conv/conv.0/conv.0.0/Conv
[INFO][2022-12-04 17:42:58.170][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b128x32_w32x16_k32_s32
[INFO][2022-12-04 17:42:58.184][algo_conv_hmma.cc:141] Compiling /features/features.2/conv/conv.2/Conv
[INFO][2022-12-04 17:43:00.891][algo_conv_hmma.cc:146] select kernel nv2spkSm75Fp16Conv_hmma1688_nhwc_f1_b32x16_w16x16_k64_s32_buf2
[INFO][2022-12-04 17:43:00.921][algo_conv_hmma.cc:141] Compiling /features/features.3/conv/conv.0/conv.0.0/Conv
[INFO][2022-12-04 17:43:06.278][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b64x32_w32x8_k32_s32
[INFO][2022-12-04 17:43:06.289][algo_conv_hmma.cc:141] Compiling /features/features.3/conv/conv.2/Conv
[INFO][2022-12-04 17:43:06.524][algo_conv_hmma.cc:146] select kernel nv2spkSm75Fp16Conv_hmma1688_nhwc_f1_b64x8_w64x8_k128_s32_buf1
[INFO][2022-12-04 17:43:06.557][algo_conv_hmma.cc:141] Compiling /features/features.4/conv/conv.0/conv.0.0/Conv
[INFO][2022-12-04 17:43:12.012][algo_conv_hmma.cc:146] select kernel nvIdxnSm75Fp16Conv_hmma1688_nhwc_b128x32_w64x8_k32_s32
[INFO][2022-12-04 17:43:12.017][algo_conv_hmma.cc:141] Compiling /features/features.4/conv/conv.2/Conv
Segmentation fault (core dumped)
What are the types of GPU/CPU you are using?
RTX 2080 Ti
What's the operating system ppl.nn runs on?
Ubuntu 18.04
What's the compiler and its version?
g++ 7.5.0
nvcc V10.2.89
Which version(commit id or tag) of ppl.nn is used?
2da19ac-dirty
What are the commands used to build ppl.nn?
cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install
cmake --build . -j 20 --config Release
cmake --build . --target install -j 20 --config Release
What are the execution commands?
./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx --kernel-type=float16 --export-algo-file=algos/mobilenet_v2_fp16.json
minimal code snippets for reproducing these problems(if necessary)
import torch
import torchvision
model = torchvision.models.mobilenet_v2(torchvision.models.MobileNet_V2_Weights.DEFAULT)
dummy_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(
model,
dummy_input,
"mobilenet_v2.onnx",
input_names=["inp"],
output_names=["out"],
opset_version=11
)
./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx --kernel-type=float16 --export-algo-file=algos/mobilenet_v2_fp16.json
models and inputs for reproducing these problems (send them to openppl.ai@hotmail.com if necessary)
Cause we donot have 2080Ti right now, we cannot reproduce your error locally. There are two ways to skip such error I used.
You can firstly test if it still get error by using default kernel.
./pplnn-build/tools/pplnn --use-cuda --onnx-model=mobilenet_v2.onnx --kernel-type=float16 --quick-select
Or, you can rebuild ppl.nn without JIT, then test again:
cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install -DPPLNN_ENABLE_CUDA_JIT=OFF
--quick-select
Using --quick-select can fix the problem. But the profiling performance is poor. Running mobilenet v2 with ppl.nn using default kernel costs 0.78ms, while running with tensorrt costs 0.39ms.
Can I use kernel tuning on mobilenet v2?
If you consider about the perfermance, you can rebuild ppl.nn without JIT. I think this way can also fix the problem you met.
cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install -DPPLNN_ENABLE_CUDA_JIT=OFF
If you consider about the perfermance, you can rebuild ppl.nn without JIT. I think this way can also fix the problem you met.
cmake .. -DCMAKE_BUILD_TYPE=Release -DPPLNN_USE_CUDA=ON -DCMAKE_INSTALL_PREFIX=install -DPPLNN_ENABLE_CUDA_JIT=OFF
Thanks. The performance of ppl without JIT is great.
Do ppl only support fp16 on 2080Ti? I tried to set the kernel type as fp32 but failed.
./build-no-jit/tools/pplnn --use-cuda --onnx-model=resnet18.onnx --kernel-type=float32
[ERROR][2022-12-07 14:42:06.426][sequential_scheduler.cc:129] exec kernel[/maxpool/MaxPool] of type[:MaxPool:11] failed: unsupported
[ERROR][2022-12-07 14:42:06.426][runtime_impl.cc:337] Run() failed: unsupported
[ERROR][2022-12-07 14:42:06.426][pplnn.cc:1315] Run() failed: unsupported
We only support fp16/int8 kernels for conv right now. So, the runtime crashed before conv.
We only support fp16/int8 kernels for conv right now. So, the runtime crashed before conv.
OK. Thanks for your reply.
Do you mind sent your broken model to openppl.ai@hotmail.com