voletiv/mcvd-pytorch

Usage for brand new private videos

Closed this issue · 4 comments

Any recommended checkpoint or strategy to just try to predict the next frames of a brand new video, without training from scratch?

You could try to implement LoRA, its very popular for fine-tuning both LLMs and Stable-Diffusion. I guess you could use the cityscape model as pre-trained model, it depends on your applications. We have no code for fine-tuning though.

Oh I was actually wondering whether it would have worked on a video out of the box. Like it is a pre-trained model that can work everywhere without fine-tuning at some baseline performance level

No, this one is trained on specific datasets. You should try looking for text-to-videos models. This way you could prompt the model to get what you want.

ok thanks :)