Preprocessing of images to run inference
isa-tr opened this issue · 4 comments
Hello, thank you very much for your work.
I am trying to preprocess a batch of images (I have my own dataset) the way you prepared your data. I'm following the notebook train_emotions.ipynb as it is in Tensforflow and I'm using that framework.
I have a question about the steps of the preprocessing, so I would like to ask you if you can tell me the correct steps. These are the steps I'm following, let me know if I'm right or if something is missing:
-
I already have my images with the faces detected and croppped, i.e, I have a dataset full of faces like this
-
img = cv2.imread(img_path)
-
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-
img = cv2.resize(img,(224,224))
-
Then your notebook shows you make a normalization
def mobilenet_preprocess_input(x,**kwargs):
x[..., 0] -= 103.939
x[..., 1] -= 116.779
x[..., 2] -= 123.68
return x
preprocessing_function=mobilenet_preprocess_input
Here I am having an issue because I cannot cast the subtraction operation between an integer and a float, so I changed it to
def mobilenet_preprocess_input(x,**kwargs):
x[..., 0] = x[..., 0] - 103.939
x[..., 1] = x[..., 1] - 116.779
x[..., 2] = x[..., 2] - 123.68
return x
preprocessing_function=mobilenet_preprocess_input
So, let me know if the process I'm following is correct or if there's something missing.
Thank you!
Thanks for your question! Your images look nice, I believe you could use the models from my repository. The preprocessing function is appropriate for my Tensorflow model (mobilenet_7.h5) only. If you want to use more accurate PyTorch models, the preprocessing is slightly different, something similar to https://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/python-package/hsemotion/facial_emotions.py#L39
Thank you for your answer! Yep, I'm also checking the preprocessing to use Pytorch's pre-trained models.
As I could see you use these transformations:
IMG_SIZE=224
preprocess = transforms.Compose(
[
transforms.Resize((IMG_SIZE,IMG_SIZE)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
]
)
Am I right?
And thank you again for answering to my question :D
You're correct if you will use enet_b0 models. I personally recommend them because of their stable results on different datasets. You could also try enet_b2 models with potential higher accuracy, but it will be better to increase the resolution of an input image for enet_b2 models by setting IMG_SIZE=260. BTW, all technical details for PyTorch models are encapsulated into hsemotion python package, which you could use out-of-the box for the facial images from your dataset.
Thanks!!