CoverGAN

CoverGAN is a set of tools and machine learning models designed to generate good-looking album covers based on users' audio tracks and emotions. Resulting covers are generated in vector graphics format (SVG).

Available emotions:

Anger
Comfortable
Fear
Funny
Happy
Inspirational
Joy
Lonely
Nostalgic
Passionate
Quiet
Relaxed
Romantic
Sadness
Serious
Soulful
Surprise
Sweet
Wary

CoverGAN can be run on a machine without a GPU, however, if a GPU is available, the model will use it.

This tutorial is for Linux.

Dependencies (for local test and training)

Install PyTorch. See https://pytorch.org/ for PyTorch install instructions.

Install other python requirements:

pip install -r requirements.txt

Install DiffVG:

Make sure that CMake is installed. See https://cmake.org/install/ for CMake install instructions.
git clone --recursive https://github.com/BachiLi/diffvg
cd diffvg && python setup.py install
cd .. && rm -rf diffvg

Testing using Docker

Building

Specify PyTorch version to install in Dockerfile .
Build the image:

docker build -t covergan .

Running

Run the container:

With CUDA enabled:

docker run --rm --network="host" --gpus 1 covergan

Only CPU using:

docker run --rm --network="host" covergan

Testing

Below is an example command that can be used to trigger the generation endpoint:

curl --progress-bar \
    -F "audio_file=@/home/user/audio.flac" \
    "http://localhost:5001/generate?track_artist=Cool%20Band&track_name=Song&emotion=joy" \
    -o ./output.json

Local testing

Compile ProtoSVG server

In protosvg folder:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh (type yes, after 1)
source $HOME/.cargo/env
rustup component add rustfmt
cargo install --locked --path .
Run as background process the ProtoSVG server by command protosvg

Running

In covergan folder:

python3 ./eval.py \
  --audio_file="test.mp3" \
  --emotions=joy,relaxed \
  --track_artist="Cool Band" \
  --track_name="New Song"

The resulting .svg covers by default will be saved to ./gen_samples folder.

Local training

Main directory is covergan.

CoverGAN network training

Put audio tracks (.flac or .mp3 format) to default ./audio folder.
Put original covers (.jpg format) to default ./clean_covers folder.
See this help document for more details about specified options.
Run ./covergan_train.py file with specified options.

Example: python3 ./covergan_train.py --emotions emotions.json

Captioner network training

Put original covers (.jpg format) in ./original_covers folder.
If the covers title and author name have been already saved to data.json file (which for each cover stores the coordinates of the captures and their text color), it should be stored at ./checkpoint/caption_dataset/data.json.
Or else put clean (with captures removed) covers (.jpg format) to ./clean_covers folder.
See this help document for more details about specified options.
Run ./captioner_train.py file with specified options.

Example: python3 ./captioner_train.py --clean_covers ./clean_covers

Examples of generated covers

See this folder with simple music tracks and their generated covers.

sslavian812/CoverGAN-izharsky

CoverGAN

Dependencies (for local test and training)

Testing using Docker

Building

Running

Testing

Local testing

Compile ProtoSVG server

Running

Local training

CoverGAN network training

Captioner network training

Examples of generated covers