/APISR

APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)

APISR is an image&video upscaler that aims at restoring and enhancing low-quality low-resolution anime images and video sources with various degradations from real-world scenarios.

ArxivHF DemoOpen In ColabHF Demo

🔥 Update | 👀 Visualization | 🔧 Installation | 🏰 Model Zoo |Inference | 🧩 Dataset Curation | 💻 Train

Update 🔥🔥🔥

  • Release Paper version implementation of APISR
  • Release different upscaler factor weight (for 2x, 4x and more)
  • Gradio demo (with online)
  • Provide weight with different architecture (DAT-Small)
  • Add the result combined with Toon Crafter
  • Release the weight trained with Diffusion Generated Images
  • Create a Project Page
  • Some Online Demo for Chinese users && README in Chinese

If you like APISR, please help star this repo. Thanks! 🤗

Visualization (Click them for the best view!) 👀

Toon Crafter Examples Upscale

Please check toon_crafter_upscale

Installation 🔧

git clone git@github.com:Kiteretsu77/APISR.git
cd APISR

# Create conda env
conda create -n APISR python=3.10
conda activate APISR

# Install Pytorch and other packages needed
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt


# To be absolutely sure that the tensorboard can execute. I recommend the following CMD from "https://github.com/pytorch/pytorch/issues/22676#issuecomment-534882021"
pip uninstall tb-nightly tensorboard tensorflow-estimator tensorflow-gpu tf-estimator-nightly
pip install tensorflow

# Install FFMPEG [Only needed for training and dataset curation stage; inference only does not need ffmpeg] (the following is for the linux system, Windows users can download ffmpeg from https://ffmpeg.org/download.html)
sudo apt install ffmpeg

Gradio Fast Inference ⚡⚡⚡

Gradio option doesn't need to prepare the weight from the user side but they can only process one image each time.

Online demo can be found at https://huggingface.co/spaces/HikariDawn/APISR (HuggingFace) or https://colab.research.google.com/github/camenduru/APISR-jupyter/blob/main/APISR_jupyter.ipynb (Colab)

Local Gradio can be created by running the following:

python app.py

Note: Gradio is designed for fast inference, so we will automatically download existing weights and downsample to 720P to ease the VRAM consumption. For a full grounder inference, please check the regular inference section below.

Regular Inference ⚡⚡

  1. Download the model weight from model zoo and put the weight to "pretrained" folder.

  2. Then, Execute (single image/video or a directory mixed with images&videos are all ok!)

    python test_code/inference.py --input_dir XXX  --weight_path XXX  --store_dir XXX

    If the weight you download is paper weight, the default argument of test_code/inference.py is capable of executing sample images from "assets" folder

Dataset Curation 🧩

Our dataset curation pipeline is under dataset_curation_pipeline folder.

You can collect your dataset by sending videos (mp4 or other format) into the pipeline and get the least compressed and the most informative images from the video sources.

  1. Download IC9600 weight (ck.pth) from https://drive.google.com/drive/folders/1N3FSS91e7FkJWUKqT96y_zcsG9CRuIJw and place it at "pretrained/" folder (else, you can define a different --IC9600_pretrained_weight_path in the following collect.py execution)

  2. With a folder with video sources, you can execute the following to get a basic dataset (with ffmpeg installed):

    python dataset_curation_pipeline/collect.py --video_folder_dir XXX --save_dir XXX
  3. Once you get an image dataset with various aspect ratios and resolutions, you can run the following scripts

    Be careful to check uncropped_hr && degrade_hr_dataset_path && train_hr_dataset_path (we will use these path in opt.py setting during training stage)

    In order to decrease memory utilization and increase training efficiency, we pre-process all time-consuming pseudo-GT (train_hr_dataset_path) at the dataset preparation stage.

    But, in order to create a natural input for prediction-oriented compression, in every epoch, the degradation started from the uncropped GT (uncropped_hr), and LR synthetic images are concurrently stored. The cropped HR GT dataset (degrade_hr_dataset_path) and cropped pseudo-GT (train_hr_dataset_path) are fixed in the dataset preparation stage and won't be modified during training.

    Be careful to check if there is any OOM. If there is, it will be impossible to get the correct dataset preparation. Usually, this is because num_workers in scripts/anime_strong_usm.py is too big!

    bash scripts/prepare_datasets.sh

Train 💻

The whole training process can be done in one RTX3090/4090!

  1. Prepare a dataset (AVC / API) that is preprocessed by STEP 2 & 3 in Dataset Curation

    In total, you will have 3 folders prepared before executing the following commands:

    --> uncropped_hr: uncropped GT

    --> degrade_hr_dataset_path: cropped GT

    --> train_hr_dataset_path: cropped Pseudo-GT

  2. Train: Please check opt.py carefully to set the hyperparameters you want (modifying Frequently Changed Setting is usually enough).

    Note1: When you execute the following, we will create a "tmp" folder to hold generated lr images for sanity check. You can modify the code to delete it if you want.

    Note2: If you have a strong CPU, and if you want to accelerate, you can increase parallel_num in the opt.py.

    Step1 (Net L1 loss training): Run

    python train_code/train.py 

    The trained model weights will be inside the folder 'saved_models' (same as checkpoints)

    Step2 (GAN Adversarial Training):

    1. Change opt['architecture'] in opt.py to "GRLGAN" and change batch size if you need. BTW, I don't think that, for personal training, it is needed to train 300K iter for GAN. I did that in order to follow the same setting as in AnimeSR and VQDSR, but 100K ~ 130K should have a decent visual result.

    2. Following previous works, GAN should start from L1 loss pre-trained network, so please carry a pretrained_path (the default path below should be fine)

    python train_code/train.py --pretrained_path saved_models/grl_best_generator.pth 

Citation

Please cite us if our work is useful for your research.

@inproceedings{wang2024apisr,
  title={APISR: Anime Production Inspired Real-World Anime Super-Resolution},
  author={Wang, Boyang and Yang, Fengyu and Yu, Xihang and Zhang, Chao and Zhao, Hanbin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={25574--25584},
  year={2024}
}

Disclaimer

This project is released for academic use only. We disclaim responsibility for the distribution of the model weight and sample images. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for, users' behaviors.

License

This project is released under the GPL 3.0 license. Also, check the disclaimer.

📧 Contact

If you have any questions, please feel free to contact me at hikaridawn412316@gmail.com or boyangwa@umich.edu.

🧩 Projects that use APISR

If you develop/use APISR in your projects, welcome to let me know. I will write all of them here. Thanks!

🤗 Acknowledgement

  • VCISR: My code base is based on my previous paper (WACV 2024).
  • IC9600: The dataset curation pipeline uses IC9600 code to score image complexity level.
  • danbooru-pretrained: Our Anime Dataset (Danbooru) pretrained RESNET50 model.
  • Jupyter Demo: The jupter notebook demo is from camenduru.
  • AVIF&HEIF: The degradation of AVIF and HEFI is from pillow_heif.
  • DAT: The DAT architecture we use for 4x scaling in model zoo is coming from this link.