An extended investigation on gender bias in DeepFace face recognition model, developed by Sefik Ilkin Serengil
In this experiment, we focus on applying perturbation on the LFW
(Labeled Faces in the Wild) benchmark dataset and compare the performance differences between Males and Females.
It is recommended to have a virtual environment set up so that it separates the dependencies of different projects by creating a separate isolated environment for each project. In our experiment, we use Anaconda
to manage it.
After installing Anaconda or a similar package manager, you may run the code below to create a virtual environment. The "envname" is up to the user to decide and the python version should be at least 3.6 or above.
$ conda create -n envname python=x.x anaconda
With the virtual environment created, the user may activate it and continue the setup following the code below:
$ conda activate envname
To deactivate it simply,
$ conda deactivate
The programming language used in this project is Python, version 3.9.12. You may refer to Python
for installation details.
The download link for deepface framework can be found here PyPI
. Alternatively, you may run the code below if you have python pip package installed in your machine.
$ pip install deepface
In this project, we use Jupyter notebook
as our a web-based interactive computing platform to document our findings and test the Deepface model.
We recommend using Viusal Studio Code
to run our program, however feel free to use your preferred code editor if you are much familiar with it.
Please proceed to clone the project repository, and run the following code to install the other packages' dependency used in this project
$ pip install -r requirements.txt
Since GitHub limits the size of files to as large as 50MB, we are unable to include the necessary datasets used in this repository. However, you may refer to this GoogleDrive
to download it.
To test with a custom dataset, the expected dataset format is shown below:
root
├── data
│ ├── LFW_gender
│ │ ├── Female
│ │ │ └── Angelina_Jolie
│ │ │ │ └── Angelina_Jolie_0001.jpg
│ │ │ │ └── Angelina_Jolie_0002.jpg
│ │ │ └── Angie_Arzola
│ │ │ │ └── Angie_Arzola_0001.jpg
│ │ ├── Male
│ │ │ └── Aaron_Peirsol
│ │ │ │ └── Aaron_Peirsol_0001.jpg
│ │ │ │ └── Aaron_Peirsol_0002.jpg
Two key criteria for the dataset format are:
- Gender splitted
- Identity labelled
This is the notebook responsible for testing the existance of gender bias in Deepface. All test results are saved to the respective CSV files.
- Import relevant libraries & load data
- Show Deepface documented results
- Parameter settings
- Dataset selection
- Model selection
- Metric and Backend configuration
- Testing gender
- Deepface test with entire LFW benchmark dataset
- Deepface test with gender split and perturbed datasets
- Test results output
- Save results to respective CSV files
The main script file to launch the project's GUI for users to interact various dataset with state-of-the-art face recognition models.
- Dataset and model selection as dropdown menu
- Sample testing and Full testing buttons
- Results are output to the respective labels
- Gender label for single image sample test
- Accuracy label for both tests
- Save and Compare Previous buttons
- Saves results under the current output
- Compares the previously saved accuracy below with current accuracy and outputs the differences
Serengil, Sefik Ilkin, and Alper Ozpinar. 2021. “HyperExtended LightFace: A Facial Attribute Analysis Framework.” In 2021 International Conference on Engineering and Emerging Technologies (ICEET), 1–4. IEEE. https://doi.org/10.1109/ICEET53442.2021.9659697.
Serengil, Sefik Ilkin, and Alper Ozpinar. 2020. “LightFace: A Hybrid Deep Face Recognition Framework.” In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), 23–27. IEEE. https://doi.org/10.1109/ASYU50717.2020.9259802.
Huang, Gary B., Manu Ramesh, Tamara Berg, and Erik Learned-Miller. 2007. “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments.” 07-49. University of Massachusetts, Amherst.
Pu, M., Kuan, M. Y., Lim, N. T., Chong, C. Y., & Lim, M. K. (2022). Fairness evaluation in deepfake detection models using metamorphic testing. In 2022 ieee/acm 7th international workshop on metamorphic testing (met) (pp. 7–14). doi:10.1145/3524846.3527337