/Few-Shot-Patch-Based-Training

a fork implementation of SIGGRAPH 2020 paper Interactive Video Stylization Using Few-Shot Patch-Based Training

Primary LanguagePython

this implementation works on python 3.9 and puts all the scripts into one script

[x] streamline everything of Few-Shot-Patch-Based-Training into one script

[x] automatically get frames to apply style on with --framegap

[x] gif support

[x] make a GUI

[x] mask support

[] optimize the movement tracking scripts to run with GPU or multiple CPU (in progress)

[] add linux support (in progress)

[] support videos over 1000 frames (in progress)

why this fork exists

the original repo was hard to comprehend and required a lot of work to start, my goal with this repo is to make it as automized as possible.

to run this script

run terminal as administrator

cd C:/path/to/Few-Shot-Patch-Based-Training-master
python _tools\fewshot_UI.py

the terminal will pause after processing the frames and folders, you can then take the frames from the folder it tells you to take them from and apply a style to those and then export the frames to the folder it tells you, then press enter a couple times to resume the script :)

if you want to process a video but it has scene changes I recommend this tool for splitting the video into pieces

install guide

make sure there's no spaces in the directories that lead to the Few-Shot-Patch-Based-Training-master folder

Download prebuilt OpenCV-4.2.0 for windows,

put the opencv-4.2.0 folder in Few-Shot-Patch-Based-Training-master\_tools\disflow

As it links against OpenCV-4.2.0, it expects Few-Shot-Patch-Based-Training-master\\_tools\disflow\opencv-4.2.0\bin in PATH.

put these into PATH system variables

put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\disflow" in PATH

put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\gauss" in PATH

put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\bilateralAdv" in PATH

put "C:\path\to\Few-Shot-Patch-Based-Training-master\_tools\disflow\opencv-4.2.0\bin" in PATH

pip installs

(venv should work now thanks to alpkabac)

pip install ruamel.yaml
pip install pysimplegui
pip install Gooey
pip install opencv-python
pip install scikit-build
pip install cython
pip install Pillow
pip install PyYAML==5.4
pip install scikit-image==0.18.1
pip install scipy==1.6.2
pip install tensorflow==2.7.0
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html  
pip install numpy==1.21.2
pip install moviepy
pip install numba

imagemagick install

Download the latest 64 bit HDR version of Image Magick from here

BTW works best if your original footage is super clean (no flickering etc)

just run the command, the guide on how to do it will be displayed in the command terminal after you run the script (you'll see)

Interactive Video Stylization Using Few-Shot Patch-Based Training

The official implementation of

Interactive Video Stylization Using Few-Shot Patch-Based Training
O. Texler, D. Futschik, M. Kučera, O. Jamriška, Š. Sochorová, M. Chai, S. Tulyakov, and D. Sýkora
[WebPage], [Paper], [BiBTeX]

Teaser

Run

Download the testing-data.zip, and unzip. The _train folder is expected to be next to the _gen folder.

Pre-Trained Models

If you want just quickly test the network, here are some pre-trained-models.zip. Unzip, and follow with the Generate step. Be sure to set the correct --checkpoint path when calling generate.py, e.g., _pre-trained-models/Zuzka2/model_00020.pth.

Train

To train the network, run the train.py See the example command below:

train.py --config "_config/reference_P.yaml" 
		 --data_root "Zuzka2_train" 
		 --log_interval 1000 
		 --log_folder logs_reference_P

Every 1000 (log_interval) epochs, train.py saves the current generator to logs_reference_P (log_folder), and it validates/runs the generator on _gen data - the result is saved in Zuzka2_gen/res__P

Generate

To generate the results, run generate.py.

generate.py --checkpoint "Zuzka2_train/logs_reference_P/model_00020.pth" 
	    --data_root "Zuzka2_gen"
	    --dir_input "input_filtered"
	    --outdir "Zuzka2_gen/res_00020" 
	    --device "cuda:0"

To generate the results on live webcam footage, run generate_webcam.py. To stop the generation, press q while the preview window is active.

generate_webcam.py --checkpoint "Zuzka2_train/logs_reference_P/model_00020.pth" 
	    --device "cuda:0"
	    --resolution 1280 720
	    --show_original 1
	    --resize 256

An optional resolution argument has been added, but the images will be always cropped to square, and resized to the size of resize x resize for shorter delay.

Installation

Tested on Windows 10, Python 3.7.8, CUDA 10.2. With the following python packages:

numpy                  1.19.1
opencv-python          4.4.0.40
Pillow                 7.2.0
PyYAML                 5.3.1
scikit-image           0.17.2
scipy                  1.5.2
tensorflow 	       1.15.3 (tensorflow is used only in the logger.py, I will remove this not-necessary dependency soon)
torch                  1.6.0
torchvision            0.7.0

Temporal Consistency [Optional]

This section is optional. It describes steps that can help to maintain temporal coherency of the resulting video sequence. All example commands and build scripts in this section assume Windows; however, it should be really straightforward to build it and run it on Linux/MacOS.

As the temporal consistency in our technique is not explicitly enforced, it gives us many advantages, e.g., parallel processing, fast training, etc., but the resulting stylized sequence may contain disturbing amount of flickering. While temporal consistency can be caused by various factors, below, we discuss how to deal with two most crucial of them.

Noise in the Input Sequence

The input video sequence captured by a camera usually contains some amount of temporal noise. While this noise might not be visible by the naked eye or might seem negligible, the network tends to amplify it. To deal with this issue, we propose to filter the input sequence using time-aware bilateral filter.

First, optical flow has to be computed. Use the optical flow tool in _tools/disflow. See section Build disflow below on how to build the tool. Once disflow.exe is built and present in the PATH, see and modify the first few lines of _tools/tool_disflow.py, and run it. It reads PNGs from the input folder and stores optical flow in flow_fwd and flow_bwd folder.

Once, the optical flow is computed, use time-aware bilateral filter tool _tools/bilateralAdv to filter the sequence. See section Build bilateralAdv below on how to build the tool. Once bilateralAdv.exe is built and present in the PATH, see and modify the first few lines of _tools/tool_bilateralAdv.py, and run it. It reads PNGs from the input folder, and optical flow data from the flow_fwd and flow_bwd; it stores filtered sequence in input_filtered. Note, feel free to parallelize the for loop in _tools/tool_bilateralAdv.py, bilateralAdv.exe uses optical flow and can be run frame by frame independently. Also, feel free to optimize bilateralAdv.exe so that is uses multiple CPU-cores or even a GPU ... I am thrilled to see your pull request :-)

Finally, to do the training and inference, use filtered input_sequence images instead of the original noisy input images. Hopefully, the results will be more stable in time.

Ambiguity in the Training Data

As the network is trained on small, by default 32x32 px patches, it is likely that multiple 32x32 px patches from input RGB frame will be very similar. For instance, if there is sky in the background of input image, patches from left and right part of the sky will likely be very similar. The problem is that in the stylyzed exemplar, these patches might be stylized slightly differently. And that is the ambiguity, multiple similar input patches will be, during the training, mapped to different stylized patches. To deal with this, we propose to use an auxiliary RGB input images that will make all input patches unique.

First, optical flow has to be computed. Use the optical flow tool in _tools/disflow. See section Build disflow below on how to build the tool. Once disflow.exe is built and present in the PATH, see and modify the first few lines of _tools/tool_disflow.py, and run it. It reads PNGs from the input folder and stores optical flow in flow_fwd and flow_bwd folder.

Once, the optical flow is computed, use _tools/gauss to compute auxiliary gaussian mixture images. See section Build gauss below on how to build the tool. Once gauss.exe is built and present in the PATH, see and modify the first few lines of _tools/tool_gauss.py, and run it. It reads mask images from the mask folder (these masks can but do not need to match the masks you use during training, see the section below for more info), and optical flow data from the flow_fwd and flow_bwd; it outputs two different gaussian mixtures in input_gdisko_gauss_r10_s10 (smaller circles) and input_gdisko_gauss_r10_s15 (larger circles). Pick one of them, e.g., input_gdisko_gauss_r10_s10, if it does not work well, try the other one. Place the folder input_gdisko_gauss_r10_s10 next to your input folder in both _gen as well as _train folder, in _train folder, the input_gdisko_gauss_r10_s10 will contain only frames corresponding ot the stylized keyframes, e.g., 001.png for Maruska640 sequence or 000.png, 030.png, 070.png, and 103.png for Zuzka2 sequence. To train, do not forget to use the correct config file, e.g., --config "_config/reference_P_disco1010.yaml" while running train.py script. To run the inference generate.py script, use an optional argument --dir_x1 input_gdisko_gauss_r10_s10 that will tell the generate.py to load images from input_gdisko_gauss_r10_s10.

Masks for Gauss

While running the gauss.exe, the gaussian mixtures are generated for every mask image, and are propagated to the sequence using optical flow, if there are multiple mask images provided, the resulting gaussian circles will be stacked on top of each other (and they will cover potential holes). The mask can (and in most cases will) be fully-white images. If you are not sure what frames to pick as mask, pick the same as your keyframes or/and first and last frame of the sequence. See the gaussian mixture results, e.g., input_gdisko_gauss_r10_s10, if there are large black holes (larger than 100x100 px), add one more mask image for the frame where the black holes are the largest.

Build Temporal Consistency Tools

Build disflow

On Windows, try to use prebuilt disflow.exe. Otherwise, use _tools/disflow/build_win.bat to build disflow.exe yourself (on Linux/MacOS, get inspired by the build script, it should be really easy to build it). As it links against OpenCV-4.2.0, it expects the opencv_world420.dll in PATH. Download OpenCV-4.2.0, they offer prebuilt Win pack. Feel free to modify the build script to use a different version of OpenCV. Note, OpenCV includes are provided and located at _tools\disflow\opencv-4.2.0\include, Windows .lib files are provided and located at _tools\disflow\opencv-4.2.0\lib.

Build bilateralAdv

On Windows, try to use prebuilt bilateralAdv.exe. Otherwise, use _tools/bilateralAdv/build_win.bat to build bilateralAdv.exe yourself (on Linux/MacOS, get inspired by the build script, it should be really easy to build it).

Build gauss

On Windows, try to use prebuilt gauss.exe. Otherwise, use _tools/gauss/build_win.bat to build gauss.exe yourself (on Linux/MacOS, get inspired by the build script, it should be really easy to build it).

Other Implementations

Credits

License

  • The Patch-Based Training method is not patented, and we do not plan on patenting.
  • However, you should be aware that certain parts of the code in this repository were written when Ondrej Texler and David Futschik were employed by Snap Inc.. If you find this project useful for your commercial interests, please, reimplement it.

Citing

If you find Interactive Video Stylization Using Few-Shot Patch-Based Training useful for your research or work, please use the following BibTeX entry.

@Article{Texler20-SIG,
    author    = "Ond\v{r}ej Texler and David Futschik and Michal Ku\v{c}era and Ond\v{r}ej Jamri\v{s}ka and \v{S}\'{a}rka Sochorov\'{a} and Menglei Chai and Sergey Tulyakov and Daniel S\'{y}kora",
    title     = "Interactive Video Stylization Using Few-Shot Patch-Based Training",
    journal   = "ACM Transactions on Graphics",
    volume    = "39",
    number    = "4",
    pages     = "73",
    year      = "2020",
}