ultralytics/yolov3

Training requires much more VRAM than v5/v8 and results in ~200 MB models comparing to <15 MB models of v5/v8

Shassk opened this issue ยท 5 comments

Shassk commented

Search before asking

  • I have searched the YOLOv3 issues and found no similar bug report.

YOLOv3 Component

Training

Bug

When trying to train a yolov3n model to compare it's size and performance to v5 and v8 I've got an incredibly large size (~200 MB) of resulting best.pt and last.pt models. Apart from that train call took a very large amount of VRAM (with imgsz=640, batch=16 almost 40 GB).
Training was performed in a Google Colab notebook changing only a single value โ€” version of YOLO model used in this call:
model = YOLO(f'yolov{yolo_version}{yolo_type}.yaml')
to switch from yolov8n.yaml to yolov3n.yaml and yolov5n.yaml.

Environment

Google Colab with Pro+ subscription and A100 GPU, train log prints this line:
Ultralytics YOLOv8.1.5 ๐Ÿš€ Python-3.10.12 torch-2.1.0+cu121 CPU (Intel Xeon 2.20GHz)
Should be

Minimal Reproducible Example

Provided in the Colab notebook in the attached archive along with a minimal dataset.
First cell contains next variables:

yolo_version = 3
yolo_type = "n"
size = 416
epochs = 100
import_folder = "yv3"
import_name = "car"
export_folder = f"yv{yolo_version}"
export_name = f"{import_name}-{size}"

to form those paths for the dataset and export folder:

print(f'YOLO config    : yolov{yolo_version}{yolo_type}.yaml')
print(f'Dataset config : /content/datasets/{import_name}/custom.yaml')
print(f'Dataset archive: /drive/MyDrive/Colab Notebooks/{import_folder}/{import_name}.zip')
print(f'Export location: /drive/MyDrive/Colab Notebooks/{export_folder}/{export_name}')

Also might need to adjust batch size in train call since with current value 96 it takes almost 21 GB of VRAM.

v3_minimal_reproducible_example.zip

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

@Shassk hello! Thank you for bringing this to our attention. It's quite unusual for YOLOv3 models to result in such large file sizes, especially when compared to YOLOv5 and YOLOv8 models. Typically, YOLOv3 models should be smaller in size.

Regarding the VRAM usage, YOLOv3 can indeed be more demanding than its successors, but the amount you're reporting seems excessive. Here are a few steps you can take to troubleshoot the issue:

  1. Model Configuration: Ensure that the yolov3n.yaml configuration file is correct and corresponds to a "nano" version of YOLOv3, which should be smaller and less demanding on resources.

  2. Batch Size: Try reducing the batch size further to see if it significantly impacts VRAM usage. This could help isolate whether the issue is with the model architecture or the training setup.

  3. Dataset: Verify that the dataset is being loaded correctly and that there are no anomalies in the data that could be causing larger model sizes (e.g., extremely high-resolution images).

  4. Training Script: Double-check the training script and any modifications you've made to ensure there are no unintended changes affecting model size or VRAM usage.

  5. PyTorch and CUDA Versions: Ensure that you're using compatible versions of PyTorch and CUDA. Sometimes, mismatches can lead to inefficient resource usage.

  6. Compare with Pretrained Models: If possible, compare your trained model with a pretrained YOLOv3 model to see if the file size discrepancy persists.

If the issue continues after these checks, please provide detailed logs and the exact configuration file used during training. This will help us diagnose the problem more effectively.

For further guidance, you can refer to our documentation at Ultralytics Docs.

Thank you for your contribution to the YOLO community, and we appreciate your patience as we work to resolve this issue. ๐Ÿš€

Shassk commented
  • Model Configuration: Ensure that the yolov3n.yaml configuration file is correct and corresponds to a "nano" version of YOLOv3, which should be smaller and less demanding on resources.

Switching yolo_type from n to s in yolov{yolo_version}{yolo_type}.yaml resulted in exactly the same size (198.1 MB best.pt) and model structure (compared in Netron, can share the link to the models if needed).

  • Batch Size: Try reducing the batch size further to see if it significantly impacts VRAM usage. This could help isolate whether the issue is with the model architecture or the training setup.

Batch size increases VRAM usage almost linearly. I've made several test runs now and for 640x640 size batch=16 results in ~16.9 GB of usage while batch=8 results in ~10.5 GB.

  • Dataset: Verify that the dataset is being loaded correctly and that there are no anomalies in the data that could be causing larger model sizes (e.g., extremely high-resolution images).

There are no error messages about boxes in training dataset, image sizes don't exceed 1376x929, and there are only 67 of them in train folder in the provided archive.
My main dataset with 1487 images all within 1920x1080 gave the same exact result (provided test data is a small portion of it).

  • Training Script: Double-check the training script and any modifications you've made to ensure there are no unintended changes affecting model size or VRAM usage.

It is directly set in the train call (96 is for 416x416 v3n):
model.train(data=f'/content/datasets/{import_name}/custom.yaml', epochs=epochs, imgsz=size, batch=96, cache=True, lr0=0.001)
There's pretty much nothing else.
If you mean my custom.yaml โ€” it's pretty simple and should affect only classes and dataset folders:

path: ./car
train: images/train
val: images/val

nc: 1
names: ['car']
  • PyTorch and CUDA Versions: Ensure that you're using compatible versions of PyTorch and CUDA. Sometimes, mismatches can lead to inefficient resource usage.

If that'd be the case other versions of YOLO would've had problems as well. Log output from Python code shows this:
Ultralytics YOLOv8.1.6 ๐Ÿš€ Python-3.10.12 torch-2.1.0+cu121 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)

And nvidia-smi reports this:

Fri Jan 26 08:06:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0              43W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
  • Compare with Pretrained Models: If possible, compare your trained model with a pretrained YOLOv3 model to see if the file size discrepancy persists.

When I've tried this using yolov3-tiny.pt and yolov3.pt from https://github.com/ultralytics/yolov3/releases the resulting .onnx models were vastly different from the ones I've trained.
However during export I've also got this log which suggests those pretrained models might've been replaced:

PRO TIP  Replace 'model=yolov3-tiny.pt' with new 'model=yolov3-tinyu.pt'.
YOLOv5 'u' models are trained with https://github.com/ultralytics/ultralytics and feature improved performance vs standard YOLOv5 models trained with https://github.com/ultralytics/yolov5.

Downloading https:\github.com\ultralytics\assets\releases\download\v0.0.0\yolov3-tinyu.pt to yolov3-tinyu.pt...
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 23.3M/23.3M [00:19<00:00, 1.28MB/s]
Ultralytics YOLOv8.0.107  Python-3.10.11 torch-2.0.1+cpu CPU
YOLOv3-tiny summary (fused): 63 layers, 12168784 parameters, 0 gradients, 19.0 GFLOPs

PyTorch: starting from yolov3-tinyu.pt with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 2000) (23.3 MB)

ONNX: starting export with onnx 1.14.0 opset 17...
============== Diagnostic Run torch.onnx.export version 2.0.1+cpu ==============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

ONNX: export success  0.8s, saved as yolov3-tinyu.onnx (46.5 MB)

Export complete (1.2s)
Results saved to D:\
Predict:         yolo predict task=detect model=yolov3-tinyu.onnx imgsz=640
Validate:        yolo val task=detect model=yolov3-tinyu.onnx imgsz=640 data=coco.yaml
Visualize:       https://netron.app

To reiterate:

  1. VRAM usage:
    • v3n imgsz=416, batch=96 โ€” ~21 GB of VRAM
    • v3n imgsz=640, batch=16 โ€” ~16.9 GB of VRAM
    • v3n imgsz=640, batch=8 โ€” ~10.5 GB of VRAM
    • v3s imgsz=640, batch=8 โ€” ~10.5 GB of VRAM
  2. Changing dataset does not change anything in model size or VRAM usage.
  3. Changing v3n to v3s resulted in the same size, internal structure, and VRAM usage.
  4. Pretrained models are significantly different both in size and in structure from both my v3s and v3n.
  5. During all my runs only parameters batch, YOLO type via yolo_type variable, imgsz via size variable were changed. For v5n and v8n โ€” yolo_version as well. Everything else was kept the same.

@Shassk, thanks for the detailed follow-up. It's clear you've done extensive troubleshooting. Here are some additional thoughts:

  1. Model Size: The similarity in size between yolov3n and yolov3s models is unexpected. It's possible there might be an issue with the configuration files or the way the model is being saved. Please ensure that the yolov3n.yaml and yolov3s.yaml files indeed define different, smaller architectures compared to the full yolov3 model.

  2. VRAM Usage: The linear increase in VRAM usage with batch size is normal, but the overall high usage for yolov3n and yolov3s is still concerning. It might be worth comparing the VRAM usage with a known baseline or benchmark for these models if available.

  3. Pretrained Models: The log message you're seeing is suggesting an upgrade to a newer "u" version of the model, which might not be directly comparable to the original yolov3 models. It's important to compare against the correct baseline.

  4. Training Script: The training script seems straightforward, but it might be worth running a sanity check with a known good configuration or script to rule out any hidden issues.

  5. PyTorch and CUDA Versions: While other versions of YOLO might not have issues, it's still possible that yolov3 implementations could behave differently under certain versions of PyTorch or CUDA. However, since you're using compatible versions, this is less likely to be the cause.

Given the information you've provided, it seems there might be an issue with the yolov3 implementation you're using. If you're confident that the dataset and training scripts are correct, and the configuration files for yolov3n and yolov3s are indeed defining smaller models, then this might require a deeper investigation into the codebase.

At this point, I would recommend:

  • Double-checking the yolov3n.yaml and yolov3s.yaml files for correctness.
  • Comparing the VRAM usage against a known benchmark for these models.
  • Ensuring that the correct pretrained models are being used for comparison.
  • Possibly opening an issue with a detailed report, including the configuration files and any logs that might help identify the problem.

Your efforts in investigating this issue are invaluable, and I'm confident that with continued collaboration, we can get to the bottom of this. Thank you for your dedication to improving the YOLOv3 experience! ๐ŸŒŸ

Shassk commented

Double-checking the yolov3n.yaml and yolov3s.yaml files for correctness.

As I understand the standard built-in models from ultralytics Python package is used. And at least in case of v8 there's only one yolov8.yaml file wuth changes for s, n etc being added when loading it in model constructor.
After creating new models like this:

from ultralytics import YOLO
v3n = YOLO('yolov3n.yaml')
v3s = YOLO('yolov3s.yaml')
print('v3n:\n', v3n)
print('v3s:\n', v3s)

I've got exactly the same model structure.
However after examining this repository's sources I've tried using yolov3-tiny.yaml parameter instead and indeed the model became much smaller.
Does this mean v3 does not follow v5/v8 naming convention with n, s, etc suffixes and has only 2 size options: regular and tiny?
Because https://docs.ultralytics.com/models/yolov3/ docs do not mention any of this.

Comparing the VRAM usage against a known benchmark for these models.

Not sure what can be considered a known benchmark here, but I can try and com compare it to v8 of different sizes and update this message with this data/.
For now I've started v3-tiny with batch=246 and it takes ~35.5 GB which is about what I'd expect from v5n/v8n as well.

Ensuring that the correct pretrained models are being used for comparison.

Those were the only available models from this repo's releases. If you have other suggestions for sourcing them - I'll try them as well.

@Shassk thank you for the update and for your continued investigation.

Regarding the model configuration files, it appears there may have been a misunderstanding. YOLOv3 does not follow the same naming convention as YOLOv5 and YOLOv8 with the n, s, m, l, x suffixes for different model sizes. YOLOv3 typically has two main variants: the full YOLOv3 model and the YOLOv3-Tiny model. The latter is a significantly smaller and faster model designed for constrained environments or real-time applications.

The yolov3-tiny.yaml configuration file you found is indeed the correct one for the tiny version of YOLOv3. The full YOLOv3 model does not have smaller variants like n or s as in the later versions of YOLO.

For VRAM usage benchmarks, it's challenging to provide a specific known benchmark as it can vary based on the system configuration, CUDA version, and other factors. However, the VRAM usage you've reported for yolov3-tiny with a large batch size seems to be more in line with expectations.

As for pretrained models, the ones you've used from the repository's releases are the correct ones for comparison. The log message you saw earlier was suggesting an upgrade to a newer "u" version, which is not necessary for your comparison purposes.

In summary, it seems like the confusion stemmed from the difference in naming conventions and available model variants between YOLOv3 and its successors. Now that you've identified the correct yolov3-tiny.yaml configuration file, you should be able to proceed with your comparisons and training with a clearer understanding of the expected model sizes and resource usage.

If you have any further questions or encounter additional issues, please feel free to reach out. Your contributions are greatly appreciated by the community. Keep up the great work! ๐Ÿš€