vsitzmann/scene-representation-networks

Questions about the paper: is dataset-specific model parameters necessary?

ventusff opened this issue · 0 comments

Hi,

First of all, thx for your work which is a lot of help to me. I'm currently working on using such scene representations as input (may have some modification) for mobile robot policy networks.

I've done some reading on your paper. If I've understood it correctly, for every category of objects/datasets, the methods needs to train a specific model for it, say the latent code z, the mapping function psi, and the nueral redering function theta for each dataset.
I think it's true since for car/chair datasets, you have different pretrained models.
It seems to be the same with other methods in this area, such as GQN.

I'm wondering if it's possible that only the prior initial latent code z needs to be dataset-specific, other networks (the mappings and the rendering networks) can be shared among all type of datasets.

Intuitively, I think the latent code should contain enough prior information, and it would be much much time saving that different types of objects share the same other networks. Since I'm trying to extract representaions for complex scenes composed of all types of objects and I want to extract representations for each of the detected objects, this would be a lot more convinient.

What's the cost of making this assumption? Loss of accuracy?

BTW, I see that you are currently working on compositional SRNs, which is of huge interest to me.
May I ask are you using topological graphs to model such compositional relations? You dont need to answer this question if you mind.