- ë°ì°¬í: PM, AI lead
- ìµê°ì€: AI
- ë°ì¹íž: AI
- ì ì ì¬: Data
- ì ê°ê±Ž: Data
This is an attempt to combine the video generation model with ImageBind to create a pipeline that can generate video even with multimodal input.
We prepare text-text, image-text, audio-text dataset to generate embedding pair of ImageBind and T5 embedding model.
python -m embedding.mapper_train
Change the promts to text or image path or audio path.
You can download our trained weight in models folder using this link.
python -m run_inference