RGB vs. BGR

I notice in the notebooks that images are converted from BGR (default when opening with cv2) to RGB, before being fed to the detector. It seems that the results are okay with BGR, but slightly different. Is it a mandatory step?

I ask this question because I have used another detector before, and I was surprised to see that one function actually expected BGR:

https://github.com/1adrianb/face-alignment/blob/250f4efea43ec7ef13ba7f5c15d80f9b85828bd1/face_alignment/api.py#L134-L141

The retinaface model repo this project was based on was trained in BGR.

see https://github.com/biubug6/Pytorch_Retinaface/blob/b984b4b775b2c4dced95c1eadd195a5c7d32a60b/test_fddb.py#L106 up to line 120 where processed image (resized, normalized, converted to pytorch tensor, HWC -> CHW -> NCHW (N == 1) converted, sent to GPU...) is fed into model. The BGR -> RGB conversion is nowhere to be found.

keep in mind that the only thing cv2.IMREAD_COLOR is doing is making sure 3 channels are loaded (for example, a grayscale image is 'converted' to color image by copying a single channel. Not RGB loading.

Other repositories/models (sometime state of the art - MTCNN comes to mind - note the 'USAGE' part of README) require RGB images to be fed into model (keep in mind PIL/Pillow images converted to numpy array are RGB).

But, this repo IS trained using RGB (even more so - it's not even using OpenCV but rather a Pillow, as seen in dataloader here:

retinaface/retinaface/inference.py

Line 66 in 0bfa402

image = np.array(Image.open(image_path))

- you might think that self.transform may be doing the conversion - but not really - as seen in yaml config provided (test-aug part for this example).

Its arbitrary. It is however obvious that the model requiring BGR image to be fed into model input are trained and intended to be used with opencv (pillow does have minor differences decoding JPEG).

In most cases, RGB/BGR shouldn't make huge difference since faces are obviously and easily detectable in both domains, but for slight average boost in detection accuracy - feed the model as it was trained/intended - in this case RGB/Pillow. Some training pipelines do have BGR<->RGB augmentations (and vise versa) - those should be used as tested for better accuracy (or less false positives, mAP, AP, TPR, FPR - whichever metric you find more suitable)..

Thanks. This makes sense! :)