[BUG] IOT BYOM Issue: <insert issue here>
Closed this issue · 1 comments
Describe the issue
I trained a customized model using Ultralytics repository in the following commit version: 3208eb72ef277b0b825306a84df6c460a8406647
. The model was trained on a custom dataset that detects persons and forklifts.
The output model was a .pt file named forklifts.pt
. I want to convert the model to .tflite
and quantize it to run on QCS6490 chipset. I used AI-HUB package, running the following:
python -m qai_hub_models.models.yolov8_det_quantized.export --device "QCS6490 (Proxy)" --ckpt-name "/home/mbenencase/Downloads/forklifts.pt" --output-dir forklifts_new
This is the job details link: https://app.aihub.qualcomm.com/jobs/jp14q9qlp/
The problem relies on me not being able to delegate all the nodes to the NPU. This is the output available on the job link above:
------------------------------------------------------------
Performance results on-device for Yolov8_Det_Quantized.
------------------------------------------------------------
Device : QCS6490 (Proxy) (12)
Runtime : TFLITE
Estimated inference time (ms) : 51.0
Estimated peak memory usage (MB): [1, 37]
Total # Ops : 267
Compute Unit(s) : GPU (227 ops) NPU (10 ops) CPU (30 ops)
------------------------------------------------------------
More details: https://app.aihub.qualcomm.com/jobs/jp14q9qlp/
tmp0eymkt7b.h5: 100%|████████████████████████████████████████████████████████| 154k/154k [00:00<00:00, 203kB/s]
Comparing on-device vs. local-cpu inference for Yolov8_Det_Quantized.
+---------------+--------------+--------+
| output_name | shape | psnr |
+===============+==============+========+
| boxes | (1, 8400, 4) | 73.76 |
+---------------+--------------+--------+
| scores | (1, 8400) | 62.87 |
+---------------+--------------+--------+
- psnr: Peak Signal-to-Noise Ratio (PSNR). >30 dB is typically considered good.
which is odd because I can run all the model nodes of the yolov8 quantized model available on AI-HUB to the DSP.
To Reproduce
job link: https://app.aihub.qualcomm.com/jobs/jp14q9qlp/
Host configuration:
- Ubuntu 20:04
- Chrome
Hi Marcelo,
It looks like the model you profiled in the link is FP32. That's why it's not running on the HTP (NPU) on your device--your device does not support FP32 inference on the HTP.
Looking further, it looks like you uploaded a model with empty AIMET encodings:
https://app.aihub.qualcomm.com/models/mnjx3p91q
The problem is we don't use existing AIMET encodings when you provide your own weights. This is a bug (we should quantize for you in this case) that will be resolved for YOLOv8 in our next release.
In the mean time, there are a few ways you can resolve this:
First, Try to quantize with QuantizeJob.
You can follow the example in
.
Alternatively, use an example from this script to generate encodings:
-
Generate encodings with https://github.com/quic/ai-hub-models/blob/2fc5329766811cd92c7e140259866fe78a8315a1/scripts/examples/quantize_detector_coco.py
-
Trace and save model
model = Model.from_pretrained(...)
ts = model.convert_to_torchscript()
torch.jit.save(ts, "model.pt")
- Copy model and encodings to directory
mkdir model
cd model
mv <path>/model.pt model.pt
mv <path>/model.encodings model_torch.encodings
cd ..
zip -r model.zip model
- Upload model to hub & compile