- Project Requirements
- Project Purpose
- Model Information
- Cloning This Repo
- Coco Dataset Information
- Running The Model
- Training
- Training Parameters
- Predicting
- Prediction Parameters
- Live Feed
- Live Feed Parameters
- Training
- Things Read
This project was written in Python. At the time of this README update, python 3.8.10
was used, but any Python up to 3.9 should work.
The following libraries with their versions are needed to completely run this project:
PyTorch: 1.11.0
PyCocoTools: 2.0.4
NumPy: 1.22.3
SciKit Image: 0.18.2
Pillow: 8.2.0
Matplotlib: 3.4.2
CV2: 4.5.2.54
click: 8.1.3
You can install all these libraries using the following commands:
pip install torch
pip install pycocotools
pip install numpy
pip install scikit-image
pip install pillow
pip install matplotlib
pip install opencv-python
pip install click
To install PyTorch with Cuda support, go to the link below: https://pytorch.org/
The purpose of this project is to code the YOLOX algorithm from scratch in order to learn more about how it works and to put the algorithm in a more readable format for other people to better understand the algorithm.
The original paper can be found from the link below: https://arxiv.org/abs/2107.08430
The original repo can be found using the link below: https://github.com/Megvii-BaseDetection/YOLOX
When reading over the YOLOX paper, I noticed it was missing a lot of content that was assumed knowledge from other papers like YOLOv3, OTA, FCOS, and others. Since this algorithm does better than the famous YOLO algorithms but does so without anchors, it is important to understand how it works in order to improve bounding box algorithms in an anchor-free manner. Using this repo, I will attempt to explain how the algorithm works in some sort of article format and will put the links below as I write them:
What is YOLO and What Makes It Special?
How Does YOLOX Work?
SimOTA For Dynamic Label Assignment
Mosaic and Mixup For Data Augmentation
What problem is the model trying to solve? This model is the first YOLO (You Only Look Once) algorithm to use anchor-free detection. An anchor is basically a predefined bounding box shape that helps the network. Instead of predicting the direct bounding box, previous YOLO algorithms predicted an offset from a predefined anchor box. So, if an anchor box had length and with of 100 and 50 and the model predicted length and width of 10 and 15, the bounding box prediction would be an offset from the anchor box with a length and width of 110 and 65. More information about anchor boxes can be found in this conversation.
What's the problem with anchor boxes? Anchor boxes are basically extra parameters. How many anchors should the model use? What should the sizes of the anchors be? These questions lead to more hyperparameter tuning and less diversity in the model.
How does the model solve the anchor box problem? YOLOX simply has the model directly predict the bounding box dimensions as opposed to predicting an offset from an anchor box. To do this, it uses a decoupled head, unlike other YOLO algorithms. Below is the side by side comparison between the YOLOv3 model and the YOLOX model, which can be found in the YOLOX paper:
The final predictions of this model are the following:
- Reg - The predicted bounding box which has 4 values:
- X value of the top-left corner of the bounding box
- Y value of the top-left corner of the bounding box
- Height of the bounding box
- Width of the bounding box
- Cls - The predicted class the model thinks is inside the bounding box. This is a one-hot encoded vector with the same number of elements are there are classes.
- IoU (obj) - The objectness prediction of the predicted bounding box. This is a single value showing how confident an object is in the predicted bounding box.
More about the model can be found in the articles I wrote.
To clone this repo, use the following command in your computer terminal:
git clone https://github.com/gmongaras/YOLOX_From_Scratch.git
After cloning the repo, please download the Coco data which is specified in Coco Dataset Information
The Coco data should be in the proper format so the model can properly find the data. The directory tree below is how the repo should be properly formatted, assuming the val 2017 and test 2017 data were downloaded:
.
├── coco
│ ├── annotations
| | ├── captions_train2017.json
| | ├── captions_val2017.json
| | ├── instances_train2017.json
| | ├── instances_val2017.json
| | ├── person_keypoints_train2017.json
| | └── person_keypoints_val2017.json
│ └── images
| | ├── train 2017
| | | ├── 000000000009.jpg
| | | ├── 000000000025.jpg
| | | ├── ...
| | | └── {more images in the train2017 dataset}
| | ├── val2017
| | | ├── 000000000139.jpg
| | | ├── 000000000285.jpg
| | | ├── ...
| | | └── {more images in the val2017 dataset}
├── models
│ ├── model - test.pkl
| └── modelParams - test.json
├── src
│ ├── YOLOX.py
│ └── {all other .py scripts}
├── testData
| ├── 000000013201.jpg
| └── {Other images to test on}
├── .gitignore
└── README.md
To test and train this particular model, I used the Coco dataset. The Coco dataset has images along with bounding box labels in those images. More information can be found on the Coco website.
In particular, I used the 2017 val and 2017 train data to train/test this model. The data can be found at the following link: https://cocodataset.org/#download
Direct download links can be found below:
- Note: The data takes up about 20 Gb of space.
-
Uncompress the following and place all images in the
./coco/images/train2017/
directory: http://images.cocodataset.org/zips/test2017.zip -
Uncompress the following and place all images in the
./coco/images/val2017/
directory: http://images.cocodataset.org/zips/val2017.zip -
Uncompress the following and place all annotations in the
./coco/annotations/
directory: http://images.cocodataset.org/annotations/annotations_trainval2017.zip
After downloading the data, your filesystem should look like the following.
Each bounding box has 4 values. These values line up with what we want our model to predict which are:
- horizontal (x) value from left
- verticle (y) value from top
- width of the bounding box
- height of the bounding box
- 1 and 2 define the top left region of the bounding box
- 3 and 4 define the length and width of the bounding box
Pretrianed models can be found using the following google drive link: https://drive.google.com/drive/folders/1hXQQgntAAs0DdrcaF8FtR4_nZnhMyvZb?usp=sharing
Please ensure that any models that were downloaded are paired with their parameters. Each model has two files:
- A .pkl file which stores the model data
- A .json file that stores extra configuration information on the model
Both files should go into the ./models/
Directory within your local repository.
After the model has been downloaded, ensure the filesystem looks like the following.
There are three different scripts I wrote to run the model:
To train the model, first download the data.
To train the model using a pre-trained model, download a pretrained model.
Assuming you now have the data and an optional pre-trained model on your computer, use the following command from the root directory of this repository to begin training the model:
python src/train.py --dataDir=[dataDir] --dataType=[dataType] --numToLoad=[numToLoad]
Note: Each parameter can be changed by adding --[parameterName]
after python src/train.py
where [parameterName] is replaced by the name of the parameter you wish to change.
Note: Default values are in brackets
Required:
- dataDir - Location of the COCO dataset
- dataType - The type of data being used in the COCO dataset (ex: val2017)
- numToLoad - Max Number of data images to load in (use -1 for all)
Model Hyperparameters
- device - [cpu] The device to train the model with
cpu
to put everything on the cpupartGPU
to only put the model on the gpufullGPU
to put everything on the gpu
- numEpochs - [300] The number of epochs to train the model
- batchSize - [128] The size of each minibatch
- warmupEpochs - [5] Number of epochs before using a lr scheduler
- alpha - [0.01] Initial learning rate
- weightDecay - [0.0005] Weight decay in SGD
- momentum - [0.9] Momentum of SGD
- ImgDim - [256] Resize the images to a square pixel value (can be 1024, 512, or 256)
- augment_per - [0.75] Percent of extra augmented data to generate every epoch
SimOTA Parameters
- q - [20] The number of GIoU values to pick when calculating the k values in SimOTA (k = The number of labels (supply) each gt has)
- r - [5] The radius used to calculate the center prior in SimOTA
- extraCost - [100000.0] The extra cost used in the center prior computation in SimOTA
- SimOta_lambda - [3.0] Balancing factor for the foreground loss in SimOTA
Model Save Parameters
- saveDir - [./models] The directory to save models to
- saveName - [model] File to save the model to
- paramSaveName - [modelParams] File to save the model parameters to
- saveSteps - [10] Save the model every "saveSteps" steps
- saveOnBest - [False] True to save the model only if it's the current best model at save time
- overwrite - [False] True to overwrite the existing file when saving. False to make a new file when saving
Model Loading Paramters Used to load a pretrained model and start training at that checkpoint
- loadModel - [False] True to load in a pretrained model, False otherwise
- loadDir - [./models] The directory to load the model from
- paramLoadName - [modelParams.json] File to load the model parameters from
- loadName - [model.pkl] Filename to load the model from
Loss Function Hyperparameters
- FL_alpha - [4.0] The focal loss alpha parameter
- FL_gamma - [2.0] The focal loss gamma parameter
- reg_weight - [5.0] Constant to weigh the regression loss over other loss
Other Coco Dataset Parameters
- categories - [""] The categories to load in (empty list to load all) (Ex: 'cat,dog,person')
To make predictions with the model, download a pretrained model.
Additionally, any images you wish the model to put bounding boxes around should be placed into the ./testData/
directory of this repository. A couple of images are already supplied.
Assuming the pre-trained model was downloaded and is in the correct repository, use the following command from the root directory of this repository to begin making predictions with the model:
python src/predict.py --dataDir=[dataDir] --loadDir=[loadDir] --paramLoadName=[paramLoadName] --loadName=[loadName]
Note: Each parameter can be changed by adding --[parameterName]
after python src/predict.py
where [parameterName] is replaced by the name of the parameter you wish to change.
Note: Default values are in brackets
Required
- dataDir - Directory to load data we want the model to make predictions on (use testData for default test images)
- loadDir - The directory to load the model from
- paramLoadName - File to load the model parameters from
- loadName - Filename to load the model from
Other Parameters
- device - [cpu] The device to train the model with (cpu or gpu)
- batchSize - [0] The size of each minibatch of data (use 0 to use a single batch)
Bounding Box Filtering
- removal_threshold - [0.5] The threshold of predictions to remove if the confidence in that prediction is below this value
- score_thresh - [0.5] The score threshold to remove boxes in NMS. If the score is less than this value, remove it
- IoU_thresh - [0.1] The IoU threshold to update scores in NMS. If the IoU is greater than this value, update it's score
Focal Loss Function Hyperparameters FL_alpha - [4.0] The focal loss alpha parameter FL_gamma - [2.0] The focal loss gamma parameter
The Live Feed mode will use a pre-trained model and your device camera to put bounding boxes around your camera environment in real-time.
To use the live feed mode, download a pretrained model.
To run the live feed mode, use the following command from the root repository in the directory:
python src/liveFeed.py --loadDir=[loadDir] --paramLoadName=[paramLoadName] --loadName=[loadName]
Note: Each parameter can be changed by adding --[parameterName]
after python src/liveFeed.py
where [parameterName] is replaced by the name of the parameter you wish to change.
Note: Default values are in brackets
Required
- loadDir - The directory to load the model from
- paramLoadName - File to load the model parameters from
- loadName - Filename to load the model from
Other Parameters
- device - [cpu] The device to train the model with (cpu or gpu)
Bounding Box Filtering
- removal_threshold - [0.5] The threshold of predictions to remove if the confidence in that prediction is below this value
- score_thresh - [0.5] The score threshold to remove boxes in NMS. If the score is less than this value, remove it
- IoU_thresh - [0.1] The IoU threshold to update scores in NMS. If the IoU is greater than this value, update it's score
Focal Loss Function Hyperparameters FL_alpha - [4.0] The focal loss alpha parameter FL_gamma - [2.0] The focal loss gamma parameter
To stop the live feed script, press esc
or ENTER
.
original paper - https://arxiv.org/abs/2107.08430v2
yolov3 - https://arxiv.org/abs/1804.02767v1
OTA - https://arxiv.org/abs/2103.14259
focal loss - https://towardsdatascience.com/multi-class-classification-using-focal-loss-and-lightgbm-a6a6dec28872
nonmax suppression - https://arxiv.org/pdf/1704.04503.pdf
GIoU - https://giou.stanford.edu/