/movie-posters-convnet

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network

Primary LanguagePython

Build Status codecov

Demo

Overview

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network. Visualization using flask as a backend and d3js for the frontend.

This project is divided into 3 main scripts:

  • get_posters.py
    • retrieve the posters from impawards.com.
    • create a thumbnail for each posters for the visualization.
  • get_features_from_cnn.py
    • extract the last convolution layer of a pre-trained ConvNet (VGG-16 or ResNet50)
  • get_data_visu.py
    • dimension reduction for data-visualization with umap.
    • compute the cosine similarity and extract the 6 ``closest'' images for each posters.

To get parameters descriptions:

  • python src/get_XXX.py --help

Requirements

OS

  • Linux/Unix/OSX (requirement for wget)
  • Python 3.3+
  • ImageMagick
  • Postgresql

Packages Python

  • BeautifulSoup 4.4
  • Tensorflow
  • Keras
  • Pandas
  • requests
  • sklearn
  • numpy
  • PIL
  • flask

Warnings

The extraction of the features from ConvNet is long if you do not owned a GPU. The computation of the similarity between each posters required O(n^2) in memory which required around 32Go of RAM.

Installation

Clone the depot:

$ git clone https://github.com/adrz/movie-posters-convnet.git
$ cd movie-posters-convnet/
$ virtualenv -p python3 env
$ source env/bin/activate
$ pip install -r requirements-gpu.txt

Create postgresql database (supposed you already install postgresql):

$ psql -U postgres -c "createuser movieposters;"
$ psql -U postgres -c "createdb movieposters;"
$ psql -U postgres -c "alter user movieposters with encrypted password 'yourpassword';"
$ psql -U postgres -c "grant all privileges on database movieposters to movieposters ;"
PGPASSWORD=yourpassword nohup pg_dump -U movieposters -d moviepostersv2 -f db_dump_hope.sql > pg_dump.log 2>&1 &

Usage

Computation

After cloning you can just launch the bash script that will:

  • download posters from 1920 to 2016
  • compute features
  • compute the datavisualization features
$ python3 src/get_posters.py -c config/development.conf
$ python3 src/get_features_from_cnn.py -c config/development.conf
$ python3 src/get_data_visu.py -c config/development.conf

Then grab a coffee...

Visualization

$ source env/bin/activate
$ export configapi=./config/development.conf
$ python3 app.py

Then launch index.html into your favorite browser:

$ chromium 127.0.0.1:8080/index.html

or

$ chromium 127.0.0.1:8080/index_complete.html

Results

Cherry-piking from the top-200 closest couple of posters (relative to cosine distance):


































License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments