VGGnet: Video Graphic Generation network

🥉 3rd YAICON Novelty Prize!!

This is an attempt to combine the video generation model with ImageBind to create a pipeline that can generate video even with multimodal input.

We prepare text-text, image-text, audio-text dataset to generate embedding pair of ImageBind and T5 embedding model.

python -m embedding.mapper_train

Change the promts to text or image path or audio path.
You can download our trained weight in models folder using this link.

python -m run_inference