- Warning
- Brief overview
- Not so brief overview
- Results
- Project structure (folders and files with brief explaination)
- Datasets (detailed)
- Models FLOPS and params table
- Testing (how to, and the format of the output)
- Requirements
- FAQ and Frequent Errors
- Installation on NVIDIA Jetson
This code is a bit of a mess. No time was left for refactoring, so some
solutions are ugly, some scripts might only work if executed from their
directory or from the main directory. Also, update_paths.sh
needs to be used
because some files require some paths in them (like, paths.py
needs the path
of datasets). Or the user could do it manually if he doesn't trust the script
(it asks before doing any changes).
Most python scripts use argparse, so --help
displays options. However, older
scripts (like ones in dataset/
) usually don't!
This project focuses on benchmarking YOLOv8 object detectors on traffic images of a surveillance type. Models of different sizes (including non-standard YOLOv8-pico and YOLOv8-femto) with different input resolutions can be tested with different backends (PyTorch, ONNX Runtime, TensorRT) on different devices. When deploying to TensorRT, quantization to FP16 and INT8 can be done.
This project heavily utilized libraries of the OpenMMLab project, including
MMYOLO, MMDetection and MMDeploy. For specific versions, see requirements.txt
.
Newer versions of MMYOLO should work.
Thirteen different models were trained:
- YOLOv8-medium 640x384
- YOLOv8-small 640x384
- YOLOv8-large with MobileNetV2 backbone 512x288 (indices used: [2, 4, 6])
- YOLOv8-nano 640x384
- YOLOv8-nano 512x288
- YOLOv8-nano 448x256
- YOLOv8-pico 512x288
- YOLOv8-pico 448x256
- YOLOv8-pico 384x224
- YOLOv8-femto 512x288
- YOLOv8-femto 448x256
- YOLOv8-femto 384x224
- YOLOv8-femto 352x192
Six datasets are used: UA-DETRAC (reannotated), MIO-TCD (test subset), AAU RainSnow (RGB camera only), MTID (beware of unannotated frames), NDIS Park (transporters and trailers were reannotated) and VisDrone (DET subset). Details will be explained later. Scripts for processing the datasets (and combining them) are provided.
When training standard YOLOv8 models (medium, small, nano) and using the MobileNetV2 backbone, pre-trained models were used (found in MMYOLO and MMPretrain (previously known as MMClassification) repositories).
All models can be deployed to both ONNX and TensorRT, as static or dynamic in the batch size dimension (dynamic batch, static input resolution). When deploying to TensorRT, FP16 quantization and INT8 quantization is possible (INT8 requires weight calibration using the validation dataset).
Inference can be done with PyTorch, ONNX Runtime and TensorRT.
Six devices were benchmarked:
- NVIDIA Jetson AGX Xavier (TensorRT and ONNX Runtime; fp32, fp16, int8)
- NVIDIA Jetson Xavier NX (TensorRT and ONNX Runtime; fp32, fp16, int8)
- NVIDIA Jetson Nano (TensorRT and ONNX Runtime; fp32, fp16)
- NVIDIA MX150 (ONNX Runtime GPU)
- Intel Core i7-9850H (ONNX Runtime) - tests were run with a single CPU thread
- Raspberry Pi 4B (ONNX Runtime) - tests were run with a single CPU thread
Batch sizes used when benchmarking: 1, 2, 4, 8, 16, 32
Benchmark results contain tests of all models (all input resolutions) on all
devices with all batch sizes (exception is that some models did not fit into
memory with large batch sizes). See Testing
chapter for details
We recommend reading the project structure before using this repository, because the project is a bit of a mess. Or just don't use this repository.
How the models performed on the test set: https://youtube.com/playlist?list=PLt6xFMYrdS_r923yO-dadzoD1YIBi0zNL
Paper can be found in root as paper.pdf
or at the publisher's website: https://www.vut.cz/www_base/zav_prace_soubor_verejne.php?file_id=252433.
Training checkpoints - .pth files (some of them) can be found here: https://drive.google.com/drive/folders/12dLInS5LaBuHbThuOophZli-DyCh7t9h?usp=sharing
Not sorted by name!
. # Repository folder
├── checkpoints/ # Where to put pre-trained models
│ ├── print_model_structure.py # Prints info about layers in a PyTorch model
│ └── example_pre-trained_model.pth
├── configs/ # MMYOLO (MMDetection) model configurations. The base is `YOLOv8_m.py`. Edit `settings.py` before training a specific model!
│ ├── settings.py # Contains additional config, such as `num_gpus`, `img_scale`, `max_epochs`, `base_lr`
│ ├── yolov8_f_352x192.py
│ ├── yolov8_f_384x224.py
│ ├── yolov8_f_448x256.py
│ ├── yolov8_f_512x288.py
│ ├── yolov8_l_mobilenet_v2_512x288_indices_246.py
│ ├── yolov8_m.py # Base
│ ├── yolov8_n_448x256.py
│ ├── yolov8_n_512x288.py
│ ├── yolov8_n_640x384.py
│ ├── yolov8_p_384x224.py
│ ├── yolov8_p_448x256.py
│ ├── yolov8_p_512x288.py
│ └── yolov8_s.py
├── dataset/ # Dataset processing scripts and such
│ ├── aau.py # Process AAU from their original format
│ ├── apply_masks.py # Apply masks to a dataset (or datasets)
│ ├── apps/ # Utilize applications - LabelBox and FiftyOne
│ │ ├── detrac_from_labelbox.py # Upload DETRAC dataset to LabelBox
│ │ ├── detrac_to_labelbox.py # Download DETRAC dataset from LabelBox
│ │ ├── fiftyone_create.py # Create a FiftyOne dataset
│ │ └── fiftyone_run.py # Run a FiftyOne server
│ ├── combine_datasets.py # Combine all datasets into one (json file)
│ ├── common.py # Defines classes, paths, shared functions and so on
│ └── copy_dataset_subsets.py # Copies only the necessary data (used images and files) to create a folder with a dataset subset (train, val, test; can be a combination of them) without unnecessary data, like unmasked images and such
│ ├── detrac.py # Process DETRAC from original format (useless after reannotation in LabelBox)
│ ├── __init__.py
│ ├── mio-tcd.py # Process MIO-TCD
│ ├── mtid.py # Process MTID
│ ├── ndis.py # Process NDIS Park
│ ├── split_dataset_into_subsets.py # Split the combined dataset into train, val, test
│ ├── split_test_by_dataset.py # Split the test subset by origin dataset (in our case, to DETRAC and MIO-TCD parts)
│ ├── stats.py # Counts class instances and number of images in the dataset files (combined, train, val, test)
│ ├── visdrone_det.py # Process VisDrone DET
│ ├── visualize_dataset.py # Visualize a processed dataset by drawing bboxes and class names on images
│ └── visualized_to_video.py # Convert visualized dataset (images with bboxes and classes) to videos (one folder -> one video)
├── deploy/ # Scripts and configs for model deployment
│ ├── common.py # To replace PLACEHOLDERS in deploy configs by specific values for a specific model. Needs to be used to build a deploy config!
│ ├── config_onnxruntime_dynamic_batch.py # ONNX Runtime deploy config - dynamic batch size
│ ├── config_onnxruntime_static.py # ONNX Runtime deploy config - static shape
│ ├── config_tensorrt_dynamic_batch_fp16.py # ...
│ ├── config_tensorrt_dynamic_batch_int8.py
│ ├── config_tensorrt_dynamic_batch.py
│ ├── config_tensorrt_static.py
│ ├── demo.jpg # Demo image used when deploying
│ ├── deploy_all.py # Deploy a model for all work dirs in the project directory with all deploy configs in this directory
│ ├── deploy_model.py # Deploy a single model
│ └── README.md
├── doc/ # The paper and the source code for the paper
│ └── doc.pdf # The paper...
├── experiments/ # Files generating the plots and tables used in the paper. They are ugly! Don't look at them!!!
│ └── # DO NOT OPEN THIS FOLDER! (because every single line is ugly)
├── save_stream/ # Automatically record RTSP or HLS streams. Not well documented because in the end, it was not used.
│ ├── hls/ # Record HLS streams (m3u8)
│ │ ├── config_example # Example of a config file. Can contain more entries
│ │ ├── process_files.sh # I guess it compresses and maybe uploads the files to cloud?
│ │ ├── README.md
│ │ └── save_stream.py # Save stream specified in the config file
│ ├── mount_and_compress.py # Mounts a cloud folder and compresses all uncompressed files in the folder?
│ └── rtsp/ # Record RTSP streams
│ ├── auto.sh # Do everything automatically
│ ├── process_files.sh
│ ├── README.md
│ └── save_stream.py
├── tests/ # Results of tests, their and deploy logs and so on..
│ ├── device_name/ # Not provided in repository!
│ │ ├── work_dir_name/
│ │ │ └── test_tensorrt_dynamic_batch_engine_batch16.log # example
│ │ ├── test_script_logs...
│ │ ├── deploy_script_logs...
│ │ └── test_all_results.json # Containing the complete test results from the device
│ ├── mAP/ # Tests which were run just for mAP values. For whole test dataset, for DETRAC only and for MIO-TCD only. With: i7, onnxruntime, dynamic, batch 8, "none" quantization
│ │ ├── test_all_mAP_results.json
│ │ ├── test_detrac_mAP_results.json
│ │ └── test_mio-tcd_mAP_results.json
│ ├── test_results.json # Merged test results from all devices
│ ├── models_analysis_log.txt # Containing the log of getting FLOPS and number of params of all models
│ └── tensorflow_dump.png # Validation mAP metrics when training all models. It's a mess!
├── vid/ # Sample videos used, for example, for inference
│ └── MVI_40701.mp4 # Example video we used to visually assess model performance (from DETRAC dataset)
├── working_dir_yolov8_f_512x288/ # Example working dir name
│ ├── best_coco/ # Contains a .pth file with the best epoch based on validation mAP (generated when training)
│ ├── coco_eval.json # File generated by extracting precision and recall from the test script
│ ├── epoch_10.pth # Example checkpoint file name
│ ├── last_checkpoint # Simple text file with absolute path to the last checkpoint, useful when continuing in a training session
│ └── yolov8_f_512x288.py # Model config exactly as was used when training. Contains absolute paths (probably)!
├── custom_modules.py # Contains the custom cut-out augmentation (it is registered by MMYOLO before training)
├── paths.py # Paths to datasets, OpenMMLab libs, checkpoints, work dirs... So very important!
├── update_paths.sh # A simple shell script to update absolute paths in relevant files using `sed`. It lists the files that might be affected and asks the user for confirmation. For example: `sh update_paths.sh "/home/tedro/Downloads/datasets" "/home/user/datasets" && sh update_paths.sh "/home/tedro/Desktop/d_projekty/bp/proj" "/home/user/repo"`
├── visualize_pipeline.py # Visualize a training/validation/test pipeline and for example, see augmented images
├── train.py # Training script. Before training, be sure to at least customize `paths.py`, `configs/yolov8_m.py` and `configs/settings.py`
├── dist_train.py # Training script for distributed training (N GPUs). Number of GPUs specified in `configs/settings.py`
├── inference_ort.py # Run inference using ONNX Runtime. Needs work but works
├── inference_pytorch.py # Run inference using PyTorch (using MMDet)
├── inference_trt.py # Run inference using TensorRT (using MMDet)
├── test_deployed.py # Run a test on a deployed model (using MMDeploy and the test dataset)
├── test_all.py # Run `test_all.py` for all available deployed models
├── collect_test_results.py # Iterates through all working dirs and reads their test logs to save them into `test_all_results.json`
├── merge_test_results.py # When test results are collected from different devices, each having a `test_all_results.json`, this script merges the files into `test_results.json` with all results in a single JSON
├── print_model_structure.py # Reads a config file to print a structure of a model
└── requirements.txt # Pip packages
- When training, they should be in COCO format (x,y,w,h format)
- UA-DETRAC was reannotated! Read its section for more info
Number of images:
================================================================================
images_total images_train images_val images_test
218821 203061 7770 7990
Class instances in all images:
===================================================================================================================
class_name class_id instances_total instances_train instances_val instances_test
bicycle 1 14036 13777 115 144
motorcycle 2 17187 16956 97 134
car 3 916317 846331 29803 40183
transporter 4 123585 112614 4176 6795
bus 5 64529 62890 702 937
truck 6 38069 36554 712 803
trailer 7 2360 2109 106 145
unknown 8 28712 26238 1258 1216
mask 9 0 0 0 0
sum 1204795 1117469 36969 50357
- https://detrac-db.rit.albany.edu/download
- Night, sunny, cloudy, rainy
- Processed: 733'833 objects in 99'771 frames (71/100 sequences reannotated)
- 25 FPS, 960x540 px
- Many camera angles, although some are repeated
- Classes originally car, bus, van, others (truck) - it was therefore reannotated so that two-wheelers and unknown vehicles are annotated, too
- Not all instances were annotated, some were masked due to time constraints
- Masks are fixed (they were very bad)
- https://tcd.miovision.com/challenge/dataset.html
- Huge, low-quality cam, surveillance type, many points of view
- Processed: 344'421 instances on 110'000 frames (only the test subset of the obejct localization part is annotated)
- Pedestrian + 9 vehicle classes (+ background)
- https://www.kaggle.com/datasets/aalborguniversity/aau-rainsnow
- Small, rain, snow, night, ... Surveillance-type, several points of view
- Processed: 10'545 instances on 1'899 frames
- Video, not sure about the FPS
- Classes: pedestrian, bicycle, car, motorbike, bus, truck
- Only RGB part is used
- We ignore this folders (annotated vehicles in them are impossible to detect because of lens flare):
["Egensevej-1", "Egensevej-3", "Egensevej-5"]
- Transporters are of class "truck", the same as all trucks...
- Several errors in the dataset! See https://www.kaggle.com/datasets/aalborguniversity/aau-rainsnow/discussion/367019
- https://www.kaggle.com/datasets/andreasmoegelmose/multiview-traffic-intersection-dataset?select=infrastructure-mscoco.json
- Small, two cams (one high quality, one less so), surveillance type
- Processed: 64'979 objects on 5'399 frames
- Needs masking! Both footages contain parked vehicles which are not annotated. Masks have to be applied manually
- 30 FPS
- Classes: Bicycle, Car, Bus, Lorry (transporter is annotated as a truck - lorry)
- Unannotated images from the drone footage!:
[1, 31], [659, 659], [1001, 1318], [3301, 3327]
- https://zenodo.org/record/6560823
- Tiny, HQ cam, surveillance type, many vehicles in one image
- Processed: 3'302 objects on 142 frames
- Only train and validation subsets are used
- Everything is annotated by a single class -
ndis.py
reannotates it because it contains transporters and trailers
-
Small, HQ cam, surveillance-type (drone)
-
Processed: 47'720 objects on 1'610 frames
-
Classes: pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, tricycle
-
Trailers are
unknown
class -
Pipeline for processing the original downloaded datasets (DOES NOT APPLY FOR DETRAC, because it was reannotated):
export d="aau"
py $d.py
py apply_masks.py $d
py split_dataset_into_subsets.py $d
- With visualizations:
export d="aau"
py $d.py
py apply_masks.py $d
py split_dataset_into_subsets.py $d
py visualize_dataset.py $d
py visualized_to_video.py $d
- DETRAC only needs applying masks and splitting dataset into subsets, if already downloaded from LabelBox
yolov8_f_352x192.py: 0.0357866 GFLOPS, 0.0590577 M params
yolov8_f_384x224.py: 0.0455466 GFLOPS, 0.07516444 M params
yolov8_f_448x256.py: 0.0607288 GFLOPS, 0.100219 M params
yolov8_f_512x288.py: 0.07808 GFLOPS, 0.1288533 M params
yolov8_l_mobilenet_v2_512x288_indices_246.py: 0.56 GFLOPS, 2.1333 M params
yolov8_m_640x384.py: 7.85 GFLOPS, 18.3907 M params
yolov8_n_448x256.py: 0.35408 GFLOPS, 0.78815 M params
yolov8_n_512x288.py: 0.45525 GFLOPS, 1.01333 M params
yolov8_n_640x384.py: 0.75875 GFLOPS, 1.68888 M params
yolov8_p_384x224.py: 0.0881066 GFLOPS, 0.1789511 M params
yolov8_p_448x256.py: 0.117475 GFLOPS, 0.2386 M params
yolov8_p_512x288.py: 0.15104 GFLOPS, 0.30677 M params
yolov8_s_640x384.py: 2.63822 GFLOPS, 3.73813 M params
Test results can be found in tests/test_results.json
.
All models (all input resolutions, both static and dynamic shapes) were tested on all devices. Except when a model did not fit into memory, all batch sizes were used. ONNX Runtime was used for tests on all devices, but on NVIDIA Jetson, TensorRT was used as well. Quantization was only used with TensorRT.
data = {
"device": { # Eg. "nano", "agx", "rpi", ...
"model": { # Eg. "yolov8_f"
"input_shape": { # Eg. "352x192"
"backend": { # "tensorrt" or "onnxruntime"
"quantization": { # "none", "fp16", "int8"
"model_shape": { # "dynamic" or "static"
"batch_size": { # Eg. "1", "8", "32"...
"bbox_mAP": "float number in a string",
"bbox_mAP_50": "float number in a string",
"bbox_mAP_75": "float number in a string",
"bbox_mAP_s": "float number in a string",
"bbox_mAP_m": "float number in a string",
"bbox_mAP_l": "float number in a string",
"fps": "float number in a strin"g
}}}}}}}}
- Edit
mmdeploy/mmdeploy/backend/onnxruntime/wrapper.py
and add before a line withInferenceSession
creation:
session_options.intra_op_num_threads = 1
session_options.inter_op_num_threads = 1
- To get precision and recall values after a test of a deployed model (to be saved to a file
coco_eval.json
):- Edit
mmdetection/mmdet/evaluation/metrics/coco_metric.py
- incompute_metrics()
, aftercoco_eval.summarize()
underelse
:
- Edit
# save the precisions, iou, recall, cls and so on to a file
import json
import os
path = os.environ.get('RESULTS_FILEPATH')
if path is not None:
results = {
'precision': coco_eval.eval['precision'].tolist(),
'recall': coco_eval.eval['recall'].tolist(),
'scores': coco_eval.eval['scores'].tolist(),
'iouThrs': coco_eval.eval["params"].iouThrs.tolist(),
'recThrs':coco_eval.eval["params"].recThrs.tolist(),
'catIds': [int(i) for i in coco_eval.eval["params"].catIds],
'maxDets': [int(i) for i in coco_eval.eval["params"].maxDets],
'areaRng': [[int(i) for i in l] for l in coco_eval.eval["params"].areaRng],
'areaRngLbl': coco_eval.eval["params"].areaRngLbl,
}
with open(path, "w") as f:
json.dump(results, f, indent=2)
print(f"Results saved to {path}")
else:
print("RESULTS_FILEPATH not set, so results not saved")
- Before running,
export RESULTS_FILEPATH=.../coco_eval.json
- Preferably, put it into a work dir, as that's where our scripts expect it
- Run test example:
for d in ./working_dir_yolov8_*; do export RESULTS_FILEPATH=/home/tedro/Desktop/proj/$d/coco_eval.json; python3 test_deployed.py deploy/config_onnxruntime_dynamic_batch.py $d onnxruntime_dynamic_batch.onnx cuda -b 8; done
- Example of then plotting a PR curve:
python3 pr_curve.py working_dir_yolov8_f_352x192/ working_dir_yolov8_f_384x224/ working_dir_yolov8_f_448x256/ working_dir_yolov8_f_512x288/
Provided in requirements.txt
, however:
- Installing OpenMMLab libraries by cloning and
pip install -e .
is recommended, because many of their files need to be modified... MMYOLO and MMDeploy needs to be installed this way because we use their tools (from their repository). - Newer onnxruntime should be okay. Be sure to use the GPU module version for inference using CUDA (onnxruntime-gpu)
- To use Python 3.6, some libraries (protobuf, MMYOLO, MMCV) need editing their
setup.py
script to allow for py3.6. Not a great solution but no compatibility issues were encountered
- Yes, when training, deploying and so on, MM libraries output a million warnings
IndexError: tensors used as indices must be long, byte or bool tensors
- A problem with MMDetection, occured on Raspberry Pi
- Fix: In
/home/pi/bp/mmdetection/mmdet/structures/bbox/base_boxes.py
, change a line containingbboxes = bboxes[index]
toboxes = boxes[index.long()]
operation.cpp:203:DCHECK(!i->is_use_only()) failed.
on Jetsons:- Solved by using older JP: 4.6.3 (older TensorRT version might work too)
- Doesn't seem to affect static models though
- The problem is in tensorrt pip package,
libnvinfer.so.8
file - TensorRT 8.5.2.2 didn't work, 8.2.1.8 and 8.2.1.9 worked (JP5.1.1 vs JP4.6.3)
- One of our scripts isn't working? Try running it from the root repo path or from the dir it is in :/
TensorRT is not available, please install TensorRT and build TensorRT custom ops first.
- Probably needs installing tensorrt libs:
sudo apt install -y tensorrt tensorrt-dev tensorrt-libs nvidia-tensorrt nvidia-tensorrt-dev
- Probably needs installing tensorrt libs:
module 'torch.distributed' has no attribute 'ReduceOp'
- You downloaded a bad version of PyTorch. Follow this README for working PyTorch downloads
AttributeError: 'MMLogger' object has no attribute '_cache'
- MMengine bug. Explained in this README (search for
_cache
)
- MMengine bug. Explained in this README (search for
mmdetection/mmdet/datasets/coco.py
might need editing (I don't remember the error message):- Instead of:
self.cat_ids = self.coco.get_cat_ids(cat_names=...
- It needs:
self.cat_ids = self.coco.get_cat_ids()
- Instead of:
- Windows:
io.UnsupportedOperation "fileno"
- In MMEngine's file
collect_env.py
, line 117 needs__stdout__
instead ofstdout
- In MMEngine's file
list index out of range
after validation?mmdet/evaluation/metrics/coco_metric.py
needs editing around line 216 for some reason:- Before line 212, add
if len(self.cat_ids) > label:
(and indent the next code block)
pad_param is not found in results
- a problem with the transformation pipeline in config. Shouldn't occur with our config filesvisualize_pipeline.py
is too slow? That's because MMDet'stools/misc/browse_dataset.py
is too slow. Fix:- In
mmengine/structures/base_data_element.py
, changemeta = copy.deepcopy(metainfo)
tometa = metainfo
- Not guarranteed! Be careful with this
- In
- Problem with unused parameters when training?:
- pytorch/pytorch#43259
- open-mmlab/mmdetection#7298
- Put this in the model config:
find_unused_parameters = True
- Problem with MMDeploy when deploying or testing? Make sure you built the library right
- It probably needs custom ops for a specific backend (onnx or tensorrt or whatever)
- open-mmlab/mmdeploy#439 (comment)
- Most often, the backend needs to be installed from source and then, MMDeploy needs to be built with the custom ops, for example:
cmake .. -DMMDEPLOY_TARGET_BACKENDS="ort" -DONNXRUNTIME_DIR=/home/pi/bp/onnxruntime/build/Linux/Release/
- Make sure to update LD_LIBRARY_PATH or PATH env variables if it doesn't work by itself
Older Jetpack 4.6.3 needs to be used, otherwise (when using 5.1.1) the TensorRT
will fail to deploy some of the models. The error is unexplainable:
operation.cpp:203:DCHECK(!i->is_use_only()) failed.
. Seems like a problem with
newer TensorRT version as the error originates from libnvinfer
.
Keep in mind that quantization to INT8 doesn't work on Nano (or might work but it's not optimized for it)
-
https://github.com/open-mmlab/mmdeploy/blob/master/docs/en/01-how-to-build/jetsons.md
- Conda is not needed; inference SDK is not needed
-
Basic: ``` sudo apt update && sudo apt -y upgrade && sudo apt -y autoremove sudo apt install -y python3-pip python3-dev python3-setuptools python3-wheel libpython3-dev if ! [ -f /usr/bin/pip ]; then sudo ln -s /usr/bin/pip3 /usr/bin/pip; fi
-
sudo apt install -y protobuf-compiler libprotobuf-dev libssl-dev libjpeg-dev zlib1g-dev libavcodec-dev libavformat-dev libswscale-dev libopenblas-base libopenmpi-dev libopenblas-dev vim tree htop tcpdump ncdu pkg-config libhdf5-dev libspdlog-dev build-essential software-properties-common cython3 tensorrt tensorrt-dev tensorrt-libs nvidia-tensorrt nvidia-tensorrt-dev libgeos-dev tmux
- !
tensorrt-dev
tensorrt-libs
andnvidia-tensorrt-dev
are not available on every device. They are probably only needed on Xavier NX with JP 5.1.1. So if you get an error, remove these from the command
- !
-
pip install --upgrade pip
-
JP 5.1.1:
pip install numpy==1.21.0 protobuf==3.20.2 matplotlib==3.7.1
-
JP 4.6.1:
pip install numpy==1.19.4
pip install matplotlib==3.3.4
# Protobuf
if ! [ -d ~/protobuf ]; then git clone --recursive https://github.com/protocolbuffers/protobuf ~/protobuf; fi
cd ~/protobuf && git checkout tags/v3.20.2 && git submodule update --init --recursive
cd ~/protobuf && ./autogen.sh && ./configure && make -j`nproc`
sudo make install
sed -i "s/>=3.7/>=3.6/g" ~/protobuf/python/setup.py
pip install -e ~/protobuf/python/
- Matplotlib might not work on NX and AGX with JP 4.6.3
pip install pycuda aiohttp scipy versioned-hdf5 packaging opencv-python Pillow tqdm
- On JP 4.6.3 PyCuda won't install
- It couldn't find
xlocale.h
sudo ln -s /usr/include/locale.h /usr/include/xlocale.h
helps
- It couldn't find
- On JP 4.6.3 PyCuda won't install
sudo pip3 install -U jetson-stats
- GitHub repos:
if ! [ -d ~/vision ]; then git clone --recursive https://github.com/pytorch/vision ~/vision; fi
if ! [ -d ~/mmcv ]; then git clone --recursive https://github.com/open-mmlab/mmcv ~/mmcv; fi
if ! [ -d ~/mmdeploy ]; then git clone --recursive https://github.com/open-mmlab/mmdeploy ~/mmdeploy; fi
if ! [ -d ~/mmdetection ]; then git clone --recursive https://github.com/open-mmlab/mmdetection ~/mmdetection; fi
if ! [ -d ~/mmengine ]; then git clone --recursive https://github.com/open-mmlab/mmengine ~/mmengine; fi
if ! [ -d ~/mmyolo ]; then git clone --recursive https://github.com/open-mmlab/mmyolo ~/mmyolo; fi
cd ~/vision; git checkout tags/v0.11.1
cd ~/mmcv; git checkout tags/v2.0.0
cd ~/mmdeploy; git checkout tags/v1.0.0
cd ~/mmdetection; git checkout tags/v3.0.0
cd ~/mmengine; git checkout tags/v0.7.2
cd ~/mmyolo; git checkout tags/v0.4.0
- Update .bashrc:
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export PATH="$PATH:$HOME/.local/bin"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"' >> ~/.bashrc
echo 'export OPENBLAS_CORETYPE=ARMV8' >> ~/.bashrc
echo 'export CUDACXX="/usr/local/cuda/bin/nvcc"' >> ~/.bashrc
echo 'export TENSORRT_DIR=/usr/include/aarch64-linux-gnu' >> ~/.bashrc
echo 'export PPLCV_DIR=~/ppl.cv/' >> ~/.bashrc
- CMake:
sudo apt-get purge cmake -y
export CMAKE_VER=3.23.1
export ARCH=aarch64
wget https://github.com/Kitware/CMake/releases/download/v${CMAKE_VER}/cmake-${CMAKE_VER}-linux-${ARCH}.sh
chmod +x cmake-${CMAKE_VER}-linux-${ARCH}.sh
sudo ./cmake-${CMAKE_VER}-linux-${ARCH}.sh --prefix=/usr --skip-license
cmake --version
- JP 5.1.1
wget https://nvidia.box.com/shared/static/ssf2v7pf5i245fk4i0q926hy4imzs2ph.whl -O ~/torch-1.11.0-cp38-cp38-linux_aarch64.whl
pip install ~/torch-1.11.0-cp38-cp38-linux_aarch64.whl
- JP 4.6.1
wget https://nvidia.box.com/shared/static/fjtbno0vpo676a25cgvuqc1wty0fkkg6.whl -O ~/torch-1.10.0-cp36-cp36m-linux_aarch64.whl
pip install ~/torch-1.10.0-cp36-cp36m-linux_aarch64.whl
pip install -e ~/vision
- ppl.cv:
if ! [ -d $PPLCV_DIR ]; then git clone https://github.com/openppl-public/ppl.cv.git $PPLCV_DIR; fi
cd $PPLCV_DIR
./build.sh cuda
- Nano MMLabs: ignore requirements:
sed -i "s/>=3.7/>=3.6/g" ~/mmcv/setup.py
sed -i "s/>=3.7/>=3.6/g" ~/mmengine/setup.py
- JP 4.6.3 MMLabs: comment out lines 290+291 in:
~/mmengine/mmengine/logging/logger.py
- Lines that should be commented out:
for logger in MMLogger._instance_dict.values():
logger._cache.clear()
- MMLabs:
MMCV_WITH_OPS=1 pip install -v -e ~/mmcv/
pip install -e ~/mmengine/
pip install -e ~/mmdetection/
pip install -e ~/mmyolo/
cd ~/mmdeploy
mkdir -p build && cd build
cmake .. -DMMDEPLOY_TARGET_BACKENDS="trt"
make -j`nproc` && make install
pip install -e ~/mmdeploy/
- ONNXRuntime with
CUDAExecutionProvider
:
pip uninstall -y onnxruntime
git clone --recursive https://github.com/microsoft/onnxruntime ~/onnxruntime
cd ~/onnxruntime/
git checkout tags/v1.11.0
./build.sh --config Release --update --build --parallel --build_wheel --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu
pip install ~/onnxruntime/build/Linux/Release/dist/onnxruntime_gpu-*-linux_aarch64.whl
- If we don't want to create a new
calib_file
for every deployed model (but instead, reuse a calibration file for each model with the same input resolution), edit~/mmdeploy/mmdeploy/utils/config_utils.py:340
from:
create_calib = calib_config.get('create_calib', False)
if create_calib:
calib_filename = calib_config.get('calib_file', 'calib_file.h5')
return calib_filename
else:
return None
to:
return calib_config.get('calib_file', 'calib_file.h5')
- Additionally, matplotlib version needs to be checked, because some package might have reinstalled it to an older version (2.1.1 version is old)
- DONE
- Packages used and working on Jetsons (src means building/installing from source):
Package Nano Xavier NX AGX XAVIER
mmcv 2.0.0 src 2.0.0 src 2.0.0 src
mmdeploy 1.0.0 src 1.0.0 src 1.0.0 src
mmdet 3.0.0 src 3.0.0 src 3.0.0 src
mmengine 0.7.2 src 0.7.2 src 0.7.2 src
mmyolo 0.4.0 src 0.4.0 src 0.4.0 src
numpy 1.19.4 1.21.0 1.21.0
onnx 1.13.1 1.13.1 1.13.1
onnruntime_gpu 1.11.0 1.11.0 1.11.0
protobuf 3.20.2 src 3.20.2 3.20.2 src
pycuda 2022.1 2022.2.2 2022.2.2
scipy 0.19.1 1.3.3 1.3.3
torch 1.10.0 1.11.0 1.11.0
torchvision 0.11.0 src 0.11.0 src none
-
For ONNX Runtime inference on Jetsons (to utilize GPU), the GPU version needs to be built from source:
- https://onnxruntime.ai/docs/build/eps.html#nvidia-jetson-tx1tx2nanoxavier
- Be sure to uninstall
onnxruntime
if present - It might need
libpython3.8-dev
orlibpython3.6-dev
-
PyTorch install on Jetsons...
- https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
- Other sources just don't work...
- Nano: v1.10.1, NX and AGX (if JP 5.1.1 is used, I guess) v1.11.0
- https://forums.developer.nvidia.com/t/pytorch-for-jetson/72048
-
OpenMMLabs on Nano? (JP 4.6.3)
- Needs protobuf 3.20.2 from source, I guess
- If it won't work with python3.6,
setup.py
needs to be edited manually to accept the old python
- If it won't work with python3.6,
- We can then install MMDeploy 1.0.0
- This installs MMEngine 0.4.0 - we will upgrade this later
- MMCV 2.0.0 (or maybe higher)
- Needs python>=3.7, so we need to modify
setup.py
to work with 3.6... - Don't forget to build with
MMCV_WITH_OPS
- Needs python>=3.7, so we need to modify
- MMEngine needs to be 0.7.2 (or maybe higher)
- Again, needs python>=3.7, so
setup.py
needs to be edited
- Again, needs python>=3.7, so
- Lines 290 and 192 in
mmengine/logging/logger.py
need to be commented out. Otherwise it throwsAttributeError: 'MMLogger' object has no attribute '_cache'
- Then, libgeos needs to be installed:
sudo apt-get install libgeos-dev
- Otherwise:
OSError: Could not find lib geos_c or load any of its variants ['libgeos_c.so.1', 'libgeos_c.so'].
- Otherwise:
- Finally, install newer matplotlib (3.3.4) if an older version was installed
- Needs protobuf 3.20.2 from source, I guess