ultralytics/yolov3

Unexpectedly large trained model size (~200 MB .pt and ~400 MB .onnx)

Shassk opened this issue ยท 4 comments

Shassk commented

Search before asking

Question

Hi! So when trying to figure out performance differences and prediction quality between v5 and v8 I've decided to try v3 on which we've trained out first set of models.
And the results were... confusing to put it lightly.
With default 640x640 size set manually (to be sure) the size of models exported to ONNX were:

  • v8: 11.7 MB .onnx
  • v5: 7.13 MB .onnx
  • v3: 395.7 MB .onnx

And the performance drop during training was huge as well, I had to set batch to 16 to barely fit into Nvidia T4 15 GB of VRAM comparing to ~11 GB with batch 56 with v5.
The code and the dataset (images of cars, 1 class car) used to all 3 was the same aside from yolo_version, yolo_type setting variables set in the previous cell (Google Colab notebook):

from ultralytics import YOLO
import subprocess
model = YOLO(f'yolov{yolo_version}{yolo_type}.yaml')  # build a new model from YAML
# batch 56 16gb gpu, 246 40gb gpu
model.train(data=f'/content/datasets/{import_name}/custom.yaml', epochs=epochs, imgsz=size, batch=16, cache=True, lr0=0.001)

Our previous v3 models were trained using Darknet and them in a Python script parsed into weights for ONNX model.
The models were turning out quite a bit larger than v5/v8 at around 33.1-33.4 MB, but I was not expecting to see a size difference this big.
Which is really unfortunate since your version allows for smoother and easier train/export process.
So what can cause this? Is there a solution? Maybe there some additional setting should be done?

Additional

No response

๐Ÿ‘‹ Hello @Shassk, thank you for your interest in YOLOv3 ๐Ÿš€! Please visit our โญ๏ธ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a ๐Ÿ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training โ“ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov3  # clone
cd yolov3
pip install -r requirements.txt  # install

Environments

YOLOv3 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv3 CI

If this badge is green, all YOLOv3 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv3 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 ๐Ÿš€

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 ๐Ÿš€!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

@Shassk hello! Thanks for reaching out with your observations. The size discrepancy you're seeing in the YOLOv3 ONNX model compared to v5 and v8 is indeed unusual. Here are a few things to consider:

  1. Model Complexity: YOLOv3 has a different architecture compared to v5 and v8, which might inherently lead to different model sizes. However, the difference should not be as drastic as you've described.

  2. Export Settings: When exporting to ONNX, ensure that you're using the same settings, such as --simplify, which can reduce the model size by eliminating redundant operations.

  3. Training Configuration: Double-check your training configuration. Differences in layer configurations or model depth can lead to larger models.

  4. Pruning: If model size is a critical factor, consider applying model pruning techniques before exporting to ONNX. This can help reduce the size and complexity of the model.

  5. Optimization: Post-training optimization techniques can also be applied to the ONNX model to reduce its size.

If you've ensured all the above and the issue persists, it might be worth looking into the specifics of how the ONNX model is being saved. Sometimes, additional metadata or training information can bloat the file size.

For further assistance, please refer to our documentation or consider opening an issue with detailed information about your training configuration and export process. We're here to help! ๐Ÿ˜Š

Shassk commented

It's not so much about export size as it is about primary trained model size in .pt format โ€” 200 MB is not what I expected. But sure, I will create a new issue with all the data.

@Shassk apologies for the confusion, and thank you for your patience. A 200 MB .pt file for YOLOv3 is indeed larger than typical. Here are a few quick checks you can do:

  1. Model Architecture: Verify that the model architecture in the .yaml file matches the expected YOLOv3 architecture without unintended modifications.

  2. Weights: Ensure that the model isn't accidentally saving additional weights or data that it shouldn't be.

  3. Optimizer State: The .pt file includes both the model weights and the optimizer state. A large optimizer state could inflate the file size.

  4. Precision: Check if the model is being saved with higher precision (e.g., float64) than necessary (float32 is standard).

If these checks don't reveal any issues, please do open a new issue with the details of your training setup, and we'll take a closer look to help resolve this. Your feedback is invaluable in improving the tools we provide to the community. ๐ŸŒŸ