/ML-Aesthetics-NIMA

A PyTorch NIMA Implementation using DenseNet. NIMA is a research endeavor that rates the aesthetic quality of images

Primary LanguageJupyter Notebook

About

This project is an implementation of the Neural Image Assessment paper. While the paper uses models Inception-v2, MobileNet, and VGG16, this project uses DenseNet121. Not only does DenseNet in general have a higher accuracy rating than any of the other models, it also has fewer parameters. This makes not only training significantly faster, but using it computationally more efficient.

Correlations of DenseNet

Like in NIMA, DenseNet was finetuned on the Aesthetic Visual Analysis (AVA) dataset, which is a dataset of over 250 thousand images, where each image was rated aesthetically on a scale of 1-10 by around 200 amateur photographers. We call AVA2 the set of images that are in the top 10% or lowest 10% of AVA by the mean rating and label them as "beautiful" or "ugly" respectively. In the second column, accuracy, of the table below, was calculated by using the mean score 0.5 as a threshold for "beautiful" and "ugly"

Correlations of DenseNet

Model Parameters Accuracy (AVA2) LLC (mean) SRCC (mean) LLC (std) SRCC (std) EMD
NIMA(MobileNet) 4M 80.36% 0.518 0.510 0.152 0.137 0.081
NIMA(VGG16) 138M 80.60% 0.610 0.592 0.205 0.202 0.051
NIMA(Inception-v2) 24M 81.51% 0.638 0.612 0.233 0.218 0.050
NIMA(DenseNet121) 8M 82.87% 0.648 0.634 0.287 0.270 0.083

I found it unusual that my EMD loss was higher than all the others despite the correlations and accuracy on AVA2 being so high. My theory is that the researchers ignored the squareroot in their formulation for the EMD loss function, which could account for the difference

Example Usage

python main.py images/dog.jpg
Probability distribution of 1-10 rating scale
[0.005 0.017 0.048 0.155 0.326 0.261 0.114 0.049 0.017 0.009]

Mean score
5.422

Standard Deviation
1.431

Acknowledgements