Some questions
Opened this issue · 0 comments
yjhong89 commented
Hi! I have some questions regarding paper.
- What is
$x_0$ ?
- In Eq2, what image LGM reconstruct ?
- My guess is multi-view noised image generated by SVD so that it can be utilized for geometric-consistent condition later.
- cond_image is the first frame and
$x$ is multi-view image pre-sampled? - What is
target
exactly (seems not self-referenced)
- Image-to-3D evaluation
- In table1, did you evaluate Ouroboros LGM solely after train (without diffusion model)?
- Why not use SV3D?
- SV3D already incorporates 3D representation onto SVD.
- Why not use SV3D as a baseline video diffusion model
Thanks!