Costwen/Ouroboros3D

Some questions

Opened this issue · 0 comments

Hi! I have some questions regarding paper.

  1. What is $x_0$ ?
  • In Eq2, what image LGM reconstruct ?
  • My guess is multi-view noised image generated by SVD so that it can be utilized for geometric-consistent condition later.
  1. In algorithm 1
    image
  • cond_image is the first frame and $x$ is multi-view image pre-sampled?
  • What is target exactly (seems not self-referenced)
  1. Image-to-3D evaluation
  • In table1, did you evaluate Ouroboros LGM solely after train (without diffusion model)?
  1. Why not use SV3D?
  • SV3D already incorporates 3D representation onto SVD.
  • Why not use SV3D as a baseline video diffusion model

Thanks!