JoliGAN provides easy-to-use GAN and Diffusion models for unpaired and paired image to image translation tasks, including domain adaptation. In a nutshell, JoliGAN allows for fast and stable training with astonishing results. A server with REST API is provided that allows for simplified deployment and usage.
JoliGAN has a large scope of options and parameters. To not get overwhelmed, follow the simple steps below. There are then links to more detailed documentation on models, dataset formats, and data augmentation.
- AR and metaverse: replace any image element with super-realistic objects
- Image manipulation: seamlessly insert or remove objects/elements in images
- Image to image translation while preserving semantic, e.g. existing source dataset annotations
- Simulation to reality translation while preserving elements, metrics, ...
- Image to image translation to cope with scarce data
This is achieved by combining powerful and customized generator architectures, bags of discriminators, and configurable neural networks and losses that ensure conservation of fundamental elements between source and target images.
Mario to Sonic while preserving the action (running, jumping, ...)
Car insertion (BDD100K) with Diffusion
Glasses removal with GANs
Real-time ring virtual try-on with GANs
video_rings_linkedin.mp4
Day to night (BDD100K) with Transformers and GANs
Clear to snow (BDD100K) by applying a generator multiple times to add snow incrementally
- SoTA image to image translation
- Semantic consistency: conservation of labels of many types: bounding boxes, masks, classes.
- SoTA discriminator models: projected, vision_aided, custom transformers.
- Advanced generators: real-time, transformers, hybrid transformers-CNN, Attention-based, UNet with attention, StyleGAN2
- Multiple models based on adversarial and diffusion generation: CycleGAN, CyCADA, CUT, Palette
- GAN data augmentation mechanisms: APA, discriminator noise injection, standard image augmentation, online augmentation through sampling around bounding boxes
- Output quality metrics: FID
- Server with REST API
- Support for both CPU and GPU
- Dockerized server
- Production-grade deployment in C++ via DeepDetect
- Linux
- Python 3
- CPU or NVIDIA GPU + CUDA CuDNN
Clone this repo:
git clone --recursive https://github.com/jolibrain/joliGAN.git
cd joliGAN
Install PyTorch and other dependencies (torchvision, visdom with:
pip install -r requirements.txt --upgrade
Example: horse to zebra from two sets of images Dataset: https://www.deepdetect.com/joligan/datasets/horse2zebra.zip
horse2zebra/
horse2zebra/trainA # horse images
horse2zebra/trainB # zebra images
horse2zebra/testA
horse2zebra/testB
Example: font number conversion Dataset: https://www.deepdetect.com/joligan/datasets/mnist2USPS.zip
mnist2USPS/
mnist2USPS/trainA
mnist2USPS/trainA/0 # images of number 0
mnist2USPS/trainA/1 # images of number 1
mnist2USPS/trainA/2 # images of number 2
...
mnist2USPS/trainB
mnist2USPS/trainB/0 # images of target number 0
mnist2USPS/trainB/1 # images of target number 1
mnist2USPS/trainB/2 # images of target number 2
Example: Add glasses to a face without modifying the rest of the face Dataset: https://www.deepdetect.com/joligan/datasets/noglasses2glasses_ffhq_mini.zip Full dataset: https://www.deepdetect.com/joligan/datasets/noglasses2glasses_ffhq.zip
noglasses2glasses_ffhq_mini
noglasses2glasses_ffhq_mini/trainA
noglasses2glasses_ffhq_mini/trainA/img
noglasses2glasses_ffhq_mini/trainA/img/0000.png # source image, e.g. face without glasses
...
noglasses2glasses_ffhq_mini/trainA/bbox
noglasses2glasses_ffhq_mini/trainA/bbox/0000.png # source mask, e.g. mask around eyes
...
noglasses2glasses_ffhq_mini/trainA/paths.txt # list of associated source / mask images
noglasses2glasses_ffhq_mini/trainB
noglasses2glasses_ffhq_mini/trainB/img
noglasses2glasses_ffhq_mini/trainB/img/0000.png # target image, e.g. face with glasses
...
noglasses2glasses_ffhq_mini/trainB/bbox
noglasses2glasses_ffhq_mini/trainB/bbox/0000.png # target mask, e.g. mask around glasses
...
noglasses2glasses_ffhq_mini/trainB/paths.txt # list of associated target / mask images
Example: Super Mario to Sonic while preserving the position and action, e.g. crouch, jump, still, ... Dataset: https://www.deepdetect.com/joligan/datasets/online_mario2sonic_lite.zip Full dataset: https://www.deepdetect.com/joligan/datasets/online_mario2sonic_full.tar
online_mario2sonic_lite
online_mario2sonic_lite/mario
online_mario2sonic_lite/mario/bbox
online_mario2sonic_lite/mario/bbox/r_mario_frame_19538.jpg.txt # contains bboxes, see format below
online_mario2sonic_lite/mario/imgs
online_mario2sonic_lite/mario/imgs/mario_frame_19538.jpg
online_mario2sonic_lite/mario/all.txt # list of associated source image / bbox file,
...
online_mario2sonic_lite/sonic
online_mario2sonic_lite/sonic/bbox
online_mario2sonic_lite/sonic/bbox/r_sonic_frame_81118.jpg.txt
online_mario2sonic_lite/sonic/imgs
online_mario2sonic_lite/sonic/imgs/sonic_frame_81118.jpg
online_mario2sonic_lite/sonic/all.txt # list of associated target image / bbox file
...
online_mario2sonic_lite/trainA
online_mario2sonic_lite/trainA/paths.txt # symlink to ../mario/all.txt
online_mario2sonic_lite/trainB
online_mario2sonic_lite/trainB/paths.txt # symlink to ../sonic/all.txt
List file format:
cat online_mario2sonic_lite/mario/all.txt
mario/imgs/mario_frame_19538.jpg mario/bbox/r_mario_frame_19538.jpg.txt
Bounding boxes format, e.g. r_mario_frame_19538.jpg.txt
:
2 132 167 158 218
in this order:
cls xmin ymin xmax ymax
where cls
is the class, in this dataset 2
means running
.
Example: Image seasonal modification while preserving objects with mask (cars, pedestrians, ...) and overall image weather (snow, rain, clear, ...) with class Dataset: https://www.deepdetect.com/joligan/datasets/daytime2dawn_dusk_lite.zip
daytime2dawn_dusk_lite
daytime2dawn_dusk_lite/dawn_dusk
daytime2dawn_dusk_lite/dawn_dusk/img
daytime2dawn_dusk_lite/dawn_dusk/mask
daytime2dawn_dusk_lite/daytime
daytime2dawn_dusk_lite/daytime/img
daytime2dawn_dusk_lite/daytime/mask
daytime2dawn_dusk_lite/trainA
daytime2dawn_dusk_lite/trainA/paths.txt
daytime2dawn_dusk_lite/trainB
daytime2dawn_dusk_lite/trainB/paths.txt
paths.txt
format:
cat trainA/paths.txt
daytime/img/00054602-3bf57337.jpg 2 daytime/mask/00054602-3bf57337.png
in this order: source image path
, image class
, image mask
, where image class
in this dataset represents the weather class.
Other semantics are possible, i.e. an algorithm that runs on both source and target
Training requires the following:
- GPU
- a
checkpoints
directory to be specified in which model weights are stored - a Visdom server, by default the training script starts a Visdom server on http://0.0.0.0:8097 if none is running
- Go to http://localhost:8097 to follow training losses and image result samples
JoliGAN has (too) many options, for finer grained control, see the full option list.
Modify as required and run with the following line command:
python3 train.py --dataroot /path/to/horse2zebra --checkpoints_dir /path/to/checkpoints --name horse2zebra \
--output_display_env horse2zebra --data_load_size 256 --data_crop_size 256 --train_n_epochs 200 \
--dataset_mode unaligned --train_n_epochs_decay 0 --model_type cut --G_netG mobile_resnet_attn
python3 train.py --dataroot /path/to/mnist2USPS --checkpoints_dir /path/to/checkpoints --name mnist2USPS \
--output_display_env mnist2USPS --data_load_size 180 --data_crop_size 180 --train_n_epochs 200 \
--data_dataset_mode unaligned_labeled_cls --train_n_epochs_decay 0 --model_type cut --cls_semantic_nclasses 10 \
--train_sem_use_label_B --train_semantic_cls --dataaug_no_rotate --dataaug_D_noise 0.001 \
--G_netG mobile_resnet_attn
python3 train.py --dataroot /path/to/noglasses2glasses_ffhq/ --checkpoints_dir /path/to/checkpoints/ \
--name noglasses2glasses --output_display_env noglasses2glasses --output_display_freq 200 --output_print_freq 200 \
--train_G_lr 0.0002 --train_D_lr 0.0001 --train_sem_lr_f_s 0.0002 --data_crop_size 256 --data_load_size 256 \
--data_dataset_mode unaligned_labeled_mask --model_type cut --train_semantic_mask --train_batch_size 2 \
--train_iter_size 1 --model_input_nc 3 --model_output_nc 3 --f_s_net unet --train_mask_f_s_B \
--train_mask_out_mask --f_s_semantic_nclasses 2 --G_netG mobile_resnet_attn --alg_cut_nce_idt \
--train_sem_use_label_B --D_netDs projected_d basic vision_aided --D_proj_interp 256 \
--D_proj_network_type efficientnet --train_G_ema --G_padding_type reflect --dataaug_no_rotate \
--data_relative_paths
python3 train.py --dataroot /path/to/online_mario2sonic/ --checkpoints_dir /path/to/checkpoints/ \
--name mario2sonic --output_display_env mario2sonic --output_display_freq 200 --output_print_freq 200 \
--train_G_lr 0.0002 --train_D_lr 0.0001 --train_sem_lr_f_s 0.0002 --data_crop_size 128 --data_load_size 180 \
--data_dataset_mode unaligned_labeled_mask_online --model_type cut --train_semantic_m --train_batch_size 2 \
--train_iter_size 1 --model_input_nc 3 --model_output_nc 3 --f_s_net unet --train_mask_f_s_B \
--train_mask_out_mask --data_online_creation_crop_size_A 128 --data_online_creation_crop_delta_A 50 \
--data_online_creation_mask_delta_A 50 --data_online_creation_crop_size_B 128 \
--data_online_creation_crop_delta_B 15 --data_online_creation_mask_delta_B 15 \
--f_s_semantic_nclasses 2 --G_netG segformer_attn_conv \
--G_config_segformer models/configs/segformer/segformer_config_b0.py --alg_cut_nce_idt --train_sem_use_label_B \
--D_netDs projected_d basic vision_aided --D_proj_interp 256 --D_proj_network_type vitsmall \
--train_G_ema --G_padding_type reflect --dataaug_no_rotate --data_relative_paths
Trains a diffusion model to insert glasses onto faces.
python3 train.py --dataroot /path/to/noglasses2glasses_ffhq/ --checkpoints_dir /path/to/checkpoints/ \
--name noglasses2glasses --data_direction BtoA --output_display_env noglasses2glasses --gpu_ids 0,1 \
--model_type palette --train_batch_size 4 --train_iter_size 16 --model_input_nc 3 --model_output_nc 3 \
--data_relative_paths --train_G_ema --train_optim radam --data_dataset_mode self_supervised_labeled_mask \
--data_load_size 256 --data_crop_size 256 --G_netG unet_mha --data_online_creation_rand_mask_A \
--train_G_lr 0.00002 --train_n_epochs 400 --dataaug_no_rotate --output_display_freq 10000 \
--train_optim adamw --G_nblocks 2
JoliGAN reads the model configuration from a generated train_config.json
file that is stored in the model directory. When loading a previously trained model, make sure the the train_config.json
file is in the directory.
Python scripts are provided for inference, that can be used as a baseline for using a model in another codebase.
cd scripts
python3 gen_single_image.py --model-in-file /path/to/model/latest_net_G_A.pth \
--img-in /path/to/source.jpg --img-out target.jpg
Using a pretrained glasses insertion model (see above):
python3 gen_single_image_diffusion.py --model-in-file /path/to/model/latest_net_G_A.pth --img-in /path/to/source.jpg\
--mask-in /path/to/mask.jpg --img-out target.jpg --img-size 256
The mask image has 1 where to insert the object and 0 elsewhere.
Ensure everything is installed
pip install fastapi uvicorn
Then run server:
server/run.sh --host localhost --port 8000
To launch tests before new commits:
bash scripts/run_tests.sh /path/to/dir
Name | Paper |
---|---|
CycleGAN | https://arxiv.org/abs/1703.10593 |
CyCADA | https://arxiv.org/abs/1711.03213 |
CUT | https://arxiv.org/abs/2007.15651 |
RecycleGAN | https://arxiv.org/abs/1808.05174 |
StyleGAN2 | https://arxiv.org/abs/1912.04958 |
Architecture | Number of parameters |
---|---|
Resnet 9 blocks | 11.378M |
Mobile resnet 9 blocks | 1.987M |
Resnet attn | 11.823M |
Mobile resnet attn | 2.432M |
Segformer b0 | 4.158M |
Segformer attn b0 | 4.60M |
Segformer attn b1 | 14.724M |
Segformer attn b5 | 83.016M |
UNet with mha | ~60M configurable |
ITTR | ~30M configurable |
To build a docker for joliGAN server:
docker build -t jolibrain/joligan_build -f docker/Dockerfile.build .
docker build -t jolibrain/joligan_server -f docker/Dockerfile.server .
To run the joliGAN docker:
nvidia-docker run jolibrain/myjoligan
If you want to contribute please use black code format. Install:
pip install black
Usage :
black .
If you want to format the code automatically before every commit :
pip install pre-commit
pre-commit install
JoliGAN is created and maintained by Jolibrain.
Code is making use of pytorch-CycleGAN-and-pix2pix, CUT, AttentionGAN, MoNCE among others.