Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network. Visualization using flask as a backend and d3js for the frontend.
This project is divided into 3 main scripts:
- get_posters.py
- retrieve the posters from impawards.com.
- create a thumbnail for each posters for the visualization.
- get_features_from_cnn.py
- get_data_visu.py
- dimension reduction for data-visualization with umap.
- compute the cosine similarity and extract the 6 ``closest'' images for each posters.
To get parameters descriptions:
- python src/get_XXX.py --help
- Linux/Unix/OSX (requirement for wget)
- Python 3.3+
- ImageMagick
- Postgresql
- BeautifulSoup 4.4
- Tensorflow
- Keras
- Pandas
- requests
- sklearn
- numpy
- PIL
- flask
The extraction of the features from ConvNet is long if you do not owned a GPU. The computation of the similarity between each posters required O(n^2) in memory which required around 32Go of RAM.
Clone the depot:
$ git clone https://github.com/adrz/movie-posters-convnet.git
$ cd movie-posters-convnet/
$ virtualenv -p python3 env
$ source env/bin/activate
$ pip install -r requirements-gpu.txt
Create postgresql database (supposed you already install postgresql):
$ psql -U postgres -c "createuser movieposters;"
$ psql -U postgres -c "createdb movieposters;"
$ psql -U postgres -c "alter user movieposters with encrypted password 'yourpassword';"
$ psql -U postgres -c "grant all privileges on database movieposters to movieposters ;"
After cloning you can just launch the bash script that will:
- download posters from 1920 to 2016
- compute features
- compute the datavisualization features
$ python src/get_posters.py -c config/development.conf
$ python src/get_get_features_from_cnn.py -c config/development.conf
$ python src/get_data_visu.py -c config/development.conf
Then grab a coffee...
$ source env/bin/activate
$ configapi=./config/development.conf
$ python app.py
Then launch index.html into your favorite browser:
$ chromium 127.0.0.1:5000/index.html
or
$ chromium 127.0.0.1:5000/index_complete.html
Cherry-piking from the top-200 closest couple of posters (relative to cosine distance):
This project is licensed under the MIT License - see the LICENSE.md file for details
- posters: IMP Awards