Exploring Blind Preprocessing to Improve Viola-Jones

We are interested in studying the impact of applying various pre-processing methods on the detection efficiency of Viola-Jones detector. The pre-processing has to be blind so to be applied in all scenarios and still we do get better performance.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

Introduction
- Pre-Processing-Algorithms
- Data-Sets
Getting Started
- Prerequisites
- Installation
Usage
Analysis

Observations
Results
Conclusions

Contributing
License
Contact
Acknowledgments

Introduction

Viola-Jones Detector is an object detection Algorithm (which was originally developed for detecting faces. We are trying to increase (possibly ::exclamation:) the detection efficiency by applying some pre-processing to images. Thus our GOAL is to study the impact of different pre-processing algos to the viola jones detector, see the impact on detection. And present some algorithm that could improve viola jones.

(back to top)

Pre-Processing-Algorithms

I. Lightning Invarient / Intensity Normalization:

Retinex Methods

SSR (single scale retinex) we want to see if there is good sigma value (gaussian kernel)
MSR (multi scale retinex) to find list of sigmas
Frankle-McCann Retinex (in build : Adaptive SSR )

Non Local Means Methods

NMBM to estimate best patch size , h , patch distance

Normalization Methods

HOMO (homo morphic filtering) to estimate cutoff frequency (in build : LSSF (Large Scale Small Scale Feature Illumination Normalization )

Histogram Distribution Functions

CLAHE to get best clip limit
HE
log intensity stretch
full scale intensity stretch

II. Deconvolutions / Convolutions:

Blind Deconvolution (richardson lucy) (in build : Blind Motion Deblurring )

Denoising

Total-variation denoising (tv chambolle) to get the best weight parameter (in build : tv bregman )
gaussian blur best sigma and kernel size
bilateral filter estimate spatial and luminance sigmas

Data-Sets

I. Face - Data Sets:

BioID
Yale
MIT-CBCL
NMBM
Caltech Face
Orl
SoF

II. Non-Face - Data Sets:

airplane
car
cat
dog
flower
forest
fruit
motorbike (an illustration of dataset is in dir [data]https://github.com/Bhartendu-Kumar/DIP/tree/main/data) : make your data folder of same structure )

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

Download and make the dataset folder as data folder in this repo. For cleaning and preprocessing. Use the scripts in the dataset_helper_scripts directory to convert all to greyscale and change extensions like (.gif and .pgm) to common extensions. Also arrange all images of a dataset in a single folder using these scripts.

Installation

Clone the repo

git clone https://github.com/Bhartendu-Kumar/DIP.git

Install required packages
```
pip install -r requirements.txt
```

(back to top)

Usage

The pipeline is as:

Step 1: Download the DataSets (from their owner sites) and use the scripts in the dataset_helper_scripts directory to convert all to greyscale and change extensions like (.gif and .pgm) to common extensions. Also arrange all images of a dataset in a single folder using these scripts.

Step 2: Arrange the data directory like shown has 2 subdirectories: faces non-faces These further have their particular sub-directories with name of the DATASET

----! Important, for illuastration "data" folder has the desired structure, just that it has few images

Step 3: Run all the "driver scripts" in the root directory of this repository. This will create an "output" directory with the result of all preprocessing methods.

Step 4: Use "csv_scripts" directory to get the performance metrics from the "output" directory

Step 5: "analysis" directory has the final performance csv files of each pre-procesing method

Step 6: Published report directory has the final findings in form of csv file.

(will update this section).

For more examples, please refer to the Documentation

(back to top)

Analysis

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Observations

The datasets choosen can be divided into 3 categories:

Easy Difficulty i. Yale ii. Caltech iii. BioID
Medium Difficulty i. MIT CBCL ii. Orl
Hard Difficulty i. SoF

1. Gaussian Blur

 There is an inverse realtion between precision and recall.

Thus blurring will produce many positives (TP and FP). Thus blurring is confusing the detector in identifying features.

Sigma Seeing the effect of Sigma first (keeping Kernel-Size = 5) inc in precision and recall over sigma

Precision is badly hurt
Performance on hard datasets improve
Confidence value is less than viola jones. There is a DISTINCT peak at sigma=8. (there is something special about this sigma in natural images)

    Is worse than original viola jones!

Kernel Size Seeing the effect of Sigma first (keeping Sigma = 1) percentage inc in precision and recall over kSize

There is a very DISTINCT convergence at KSize = 11.
At K = 11 , it is exactly similar to viola jones!
Confidence at K = 11 , is minultely better than original viola jones.

    k=11 and sigma = 8 are BEST!
    But is very near to original Algo.

2. HOMO

inc in precision and recall plot

Seems conservative, i.e. not allowing any detection to be classified as face until very sure!

 Precision is BETTER than original and Recall is LESS than orifinal!

Increases difficulty of viola jones, the sure face features have to be there.

cutoff Freq = 20 is the best performance in parameter space!
There are many many false negatives.

  Intuitions built:
  1. Viola Jones has learnt complete intensity coherent pattern for each rectangular feature, not just {interest points}  
  2. But "most of the learning" is in high freq components, as HOMO has "BETTER PRECISION "! 
  3. False Negative increases much in HOMO.

Further we did Analysis on Non_Face Datasets and indeed HOMO DO NOT GET CONFUSED on the HIGH FREQUENCY information of face and non-face!.

3. Retinex FM

inc in precision and recall plot

Illumination normalization seems to predict a lot more faces than original viola jones !

 Inverse realtion between "precision" and recall:! Recall inc and precision decrease.

Confuses viola jones as making all regions near to face intensity pattrens, and also increases positive detections (true and false both).

Iterations = 9 is the realtively good in parameter space!
There are many many false positives.

  Intuitions built:
  1. False positive graph have many peaks (some too high) so we choose a "low peak as a balance".  
  2. But the FALSE POSITIVES have low confidence score and thus "thresholding on confidence (level_weights) ", this method has some promise!

4. SSR

inc in precision and recall plot

Sightly adversly affects detection. Lowers positive detections (true and false both). Like HOMO is a high pass filter and have effects but less extremely.

 Inverse realtion between "precision" and recall:! Recall dec and precision inc.

Not much better than original viola jones.
Sigma = 211 is the realtively good in parameter space!
There are very few positives.

  Intuitions built:
  1. One very curious thing is at sigma=140,  there is a major "drop in detections" (less than 10%) detections and then it again goes back to normal.  
  2. Thus something very particular to the high frequency image corresponding to gaussian sigma= 140 is there, that makes each region as non-face. So, what viola has learnt is not there. 
  3. False Positives in Non_face datasets inc.

5. Bilateral Filter

inc in precision and recall plot

 There is an "inverse" realtion between "precision" and recall:.

Sigma Illuminance is auto estimated as per the intenisty signature of image. Sigma Spatial

  Intuitions built:
  1. Less positives. Thus face patterns are being destroyed by smoothing.

There is a continuous degradatiopn in performance thus good to choose sigma=1.

    Is worse than original viola jones!

6. CLAHE

inc in precision and recall plot

Histogram equalization makes a lot of positive (FP + TP) predictions than original viola jones.

 Both precision and recall decrease.

Clip limit = 2 is the realtively good in parameter space!

  Intuitions built:
  1. It varies a lot wrt dataset.  
  2. TP decreases , FP inc and FN dec slightly.
  3. It is not a stable algo to apply before viola-jones.

7. TV Chambolle

inc in precision and recall plot

Makes it hard for viola jones to detect!

  Precision increase and recall decrease. But the positives are very very less.

Weight = 3 is the choosen from all

  Intuitions built:
  1. Constant in parameter space with minor degradation in performance. 
  2. Very very less detections.

Not suitable to apply.

7. MSR

Makes it hard for viola jones to detect!

  Precision and recall for some datasets are in inverse and for some are in sync.

Number of iterations = 10
Sigma list = 90_100_110_120_130_140_150_160_170_180
is the choosen from all

  Intuitions built:
  1. There are regions in parameter space where precision and recall both inc. 
  2. Also there are corresponding regions where there is slight improvement in confidence score.

8. NMBN

inc in precision and recall plot

Makes it hard for viola jones to detect! Very less detections.

  Inverse relationship between precision and recall. Precision inc and recall dec.

parameter : patchSsize = 5 , h = 0.7, patchDistance = 2 is the choosen from all

  Intuitions built:
  1. Hurts viola jones. 
  2. Very less detections. 
  3. But still have some low freq. components intact and thus is not as extreme as homo.

Not suitable to apply.

Results

Precision and recall plot (inc from viola jones plotted) Confidence plot (percentage inc from viola jones plotted)

BDA
- Hurting viola jones detector.
- Precision : inc (less predictions)
- Recall : dec (more false negatives)
- Confidence Score: dec (-7 % of viola )
- Verdict : NO
Bilateral
- Mostly not better than viola jones and across dataset variation.
- Precision : almost same
- Recall : dec (more false negatives)
- Confidence Score: almost viola
- Verdict : Maybe
Gaussian Blur
- Slightly an edge in certain datasets, but overall precision goes down.
- Precision : dec
- Recall : almost same (this is actually sloightly better)
- Confidence Score: same
- Verdict : Maybe
HE (histogram equalization)
- Detects more positives (TP and FP)
- Precision : much dec
- Recall : almost same
- Confidence Score: 5% decrease
- Verdict : NO
HOMO (homomorphic)
- Detection rate very low
- Precision : much inc
- Recall : much dec
- Confidence Score: much dec
- Verdict : Only if very high Precision needed
logarithmic intensity stretch
- good recall , same precision
- Precision : inc or dec
- Recall : inc
- Confidence Score:
- Verdict : Yes
MSR
- precision same, recall dec
- Precision : same
- Recall : dec
- Confidence Score: dec
- Verdict : NO
logarithmic intensity stretch
- good recall , same precision
- Precision : inc or dec
- Recall : inc
- Confidence Score: same as viola (3% decrease max)
- Verdict : Yes
SSR
- Precision inc and recall is almost same. But total predictions decrease.
- Precision : inc
- Recall : same
- Confidence Score: decreases (10% max decrease)
- Verdict : Maybe
TV Chambolle
- Not desirable.
- Precision : inc
- Recall : dec
- Confidence Score: much decrease
- Verdict : NO
retinex FM
- All metrics very close to original viola jones
- Precision : almost same (slight dec)
- Recall : inc
- Confidence Score: improves
- Verdict : Yes
CLAHE
- Does much false predictions
- Precision : much dec
- Recall : inc
- Confidence Score: same as viola
- Verdict : NO
NMBN
- Detection rate very low
- Precision : much inc
- Recall : much dec
- Confidence Score: much dec
- Verdict : Only if very high Precision needed

Conclusions

Viola Jones Classifier has learnt information about low frequencies with importance, i.e. high frequency components like edges/corners are importnat to learn but it is the complete pattern in a rectangular feature that is important in detection not just high gradient patterns.
(this is supported by degraded performance of those methods that ignore low frequencies and focus on high frequencies like HOMO, gradient normalization ,etc)
The high frequency information that Viola Jones has learnt are very accurate of faces, i.e. those high frequency patterns are not found in non-faces.
Viola Jones is not illumination invarient. Under severe illumination changes face is treated as non-face. And non-faces could be face if illumination normalized.

Positive-Preprocessings found:

1. retinex FM (Best)

2. logarithmic intensity stretch

3. SSR

4. Can use series of preprocessing:

Use HE to first detect all face candidates and then can choose amongst them using HOMO

Pro-Tips:

If you want to be dead sure on having a face : Apply high frequency inf. enriching ones like HOMO or if denoising then NMBN
If you want to have more regions detected as face: Apply dynamic intensity range stretch like Retinex FM
If you want to escape from viola-jones, study the SSR output at sigma=140, there viola-jones has absolutely no clue. But peculiar thing is at other sigmas it is normal !!
Bilateral has high across dataset variation, but good performance. So can be used in non-blind cases.

(back to top)

To-DO

Study what is there in SSR output for sigma= 140.
Study Histogram matching to normal gaussian
Write references

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Bhartendu Kumar - email - bhartendukumar1998 [ @ ]gmail [dot] com

Project Link: https://github.com/Bhartendu-Kumar/DIP

(back to top)

Acknowledgments

Will have to make a comprehensive reference list.

(back to top)