/YAICON-VGGnet

[🥉YAICON 2023] Generates a video with multimodal input through ImageBind (Dec. 2023)

Primary LanguagePython

VGGnet: Video Graphic Generation network

🥉 3rd YAICON Novelty Prize!!

Notion Instagram

Members

  • 박찬혁: PM, AI lead
  • 최가윀: AI
  • 박승혞: AI
  • 유선재: Data
  • 제갈걎: Data



This is an attempt to combine the video generation model with ImageBind to create a pipeline that can generate video even with multimodal input.

1. Dataset

We prepare text-text, image-text, audio-text dataset to generate embedding pair of ImageBind and T5 embedding model.

2. Mapping Model

Train

python -m embedding.mapper_train

Inference

Change the promts to text or image path or audio path.
You can download our trained weight in models folder using this link.

python -m run_inference