This is a project that has been implemented as part of the Coding da Vinci Hakathon and uses this data set from the Augustinermuseum. Our project can be split into three use cases:
- Style Transfer
- Similarity
- Exploration
Our Project consists of python python backend (styletransfer
, similarity
, videos
, utils
),
and javascript react bootstrap frontend (sda_mathcer
). Connection is performed through flask in python and ajax in javascript.
- Install the environment with conda
- Install python flask server on the remote server with GPU
- Start app.py on the remote server
- Start the tunnel from the server to your local device with 127.0.0.1:5000
- Install react and npm on the local device
- Install all dependencies in sdm_matcher using npm
- Start the localhost
This is a modification of Neural Neighbor Style Transfer Some modifications are done to perform FP16 iterations on GPU. This modification reduces execution time by factor 2 and the quality remains almost the same.
The interface execute_one
proposes the method execute_one(...)
for processing of one image with image style.
The content and style images are awaited under /styletransfer/NeuralNeighborStyleTransfer/inputs
We adapted the unsupervised learning method to cluster portraits. The trained-well clustering model was stored in a pickle file. And these portraits were clustered into eight groups and stored in a pickle file. Also, when a user uploads a new picture by GUI, we exploit the similarity and make a reliable prediction that portraits in which cluster has the highest similarity with this new picture.
The details are as follows:
- Extract the features of each portrait.
- Apply the k-means algorithm to process the feature matrix and form a robust clustering model.
- Use this model and predict function to decide a new picture belongs to which cluster.
- Return all the portraits in this cluster.
How to use it?
- Run the requirements.txt and import all the necessary packages.
- Run the cluster function to train a model.
- Run the prediction function to get the most similar portraits.
We created videos in .mp4 form with lip-sync, where the portraits introduce themselves. Files that are relevant for this use case including requirements.txt can be found in /videos.
To realise this we had to go through the following steps:
- Extract the needed information from JSON file provided in the data set for every portrait. (Extraction code can be found in text_to_speech.py)
- Convert the resulting text to audio. For Female voice in German we used the gtts library here. For male voice in german we used the ibm-watson library here and the available voice models can be found here.
- Convert the .jpg to .mp4 videos. A face has to be available in every frame or else the model (see next step) will through an error.
- Feed the Wav2Lip Model each audio and video and the output will be a video in an mp4 format with lip-sync.
All video results can be found here.