large optimization/ inference time

Question

large optimization/ inference time

Closed this issue 5 months ago · 3 comments

Hi, this is truly a great piece of work!
One of the things I noticed is that the inference or optimization time per video generation is pretty large (I tried an example that took about 35mins). Is this expected?

Answer 1 · 2024-01-27T06:12:36.000Z

Hi,

35 minutes is more or less expected, yes (we reported ~30 mimutes on an A100 in the paper).

Two things you can do to reduce this:

If you don't need global movement you could try to run without the global path. This should cut down times in half. You may need a bit of extra parameter tuning for the local path if you do this.
You can try using fewer iterations. In some cases you can get decent results even with ~500 steps.

Answer 2 · 2024-01-30T14:39:40.000Z

@rinongal Just curious to know, is this large inference time due to using diffvg at each iteration for each frame?

Answer 3 · 2024-01-31T07:45:22.000Z

The biggest factor is the fact that we're an optimization based approach. We're using Score Distillation Sampling (SDS), and SDS often requires many steps to converge well. You can see this for example in the original DreamFusion paper that proposed SDS, which takes 1.5 hours on 4 GPUs to render a single scene.

If we could use 10 steps and not 1000 (for example, if we had better initialization or a stronger motion signal), we'd be done in 20 seconds. Hopefully some follow up works will get us there :)