Costwen/Ouroboros3D

Some questions

Opened this issue 2 months ago · 0 comments

yjhong89 commented 2 months ago

Hi! I have some questions regarding paper.

What is $x_0$ ?

In Eq2, what image LGM reconstruct ?
My guess is multi-view noised image generated by SVD so that it can be utilized for geometric-consistent condition later.

In algorithm 1

cond_image is the first frame and $x$ is multi-view image pre-sampled?
What is target exactly (seems not self-referenced)

Image-to-3D evaluation

In table1, did you evaluate Ouroboros LGM solely after train (without diffusion model)?

Why not use SV3D?

SV3D already incorporates 3D representation onto SVD.
Why not use SV3D as a baseline video diffusion model

Thanks!