TorchGANime: Video generation of anime content conditioned on two frames

Paper | Presentation

tl;dr This is the PyTorch implementation of GANime, a model capable to generate video of anime content based on the first and last frame. This model is trained on a custom dataset based on the Kimetsu no Yaiba anime. It is composed of two model, a VQ-GAN for image generation, and a GPT2 transformer to generate the video frame by frame.

This original project is a Master thesis realised by Farid Abdalla at HES-SO in partnership with Osaka Prefecture University (now renamed to Osaka Metropolitan University) in Japan and is available on this repository.

All implementation details are available in this pdf.

Intermediate results

Here are some intermediate results obtained during the training of the model. The grey frame at the end is because the generated video depends on the longest video from the batch.

Generated videos

Ground truth