ykasten/layered-neural-atlases

can this framework do long video fusion?

Opened this issue · 3 comments

Hello, author, thank you for your work. Excuse me, can this framework do long video fusion? For example, 5 minutes of video. And edit_ How is the image of inputs obtained? The paper not only changes the flowers on the clothes, but also changes the chairs. Are two legends constructed?

We've worked on videos containing up to 70 frames. Regarding the edit, you need to provide an RGBA edit-image for every layer separately. In this repo we provided examples that apply an edit only on one layer (either foreground or background), but note that every one of these examples is using two RGBA images - one for each layer. It's just that one of the images is blank (because we're editing only one layer of the video).
You can see examples of these RGBA pairs of images after downloading the pretrained models: you will find them under pretrained_models/edit_inputs/<video-name> with the names: edit_<video-name>_foreground.png and edit_<video-name>_background.png

Each video needs to be trained separately, right? How long is the training time?
As shown in the figure below, I don't quite understand how such an image is constructed?
edit_blackswan_foreground

The training time depends on your hardware, we report our running times in the paper. Each video is trained separately, and the edit it supplied by the user.
Please refer to the paper for more details.