facebookresearch/iSDF

Query about several params

JoyHighmore opened this issue · 6 comments

Thanks for your brilliant work!

I have several questions regarding to your code:

  1. How do you get the depth_scale?
  2. Why it is needed to scale the input and output for the network and how do you get the scale factor? And abot the inverse transform when given the input, why is it necessary to place the object center at the origin point (if my understanding is correct)
  3. What's the meaning of the frac_time_perception

Hi, thanks for taking the time to run the code and for the questions!

  1. depth_scale is just a property of how the depth images are saved. e.g. for ScanNet the depth images are of type uint16 and the depth values are in mm, so to convert to metres we multiple by depth_scale = 1000. For replicaCAD, the depth values are saved so that the max depth is 20m; as the max uint16 value is 65535, the depth scale is 3276.75 = 65535 / 20.
  2. It improves performance to scale the input to roughly [-1, 1], similarly scaling the output makes it easier for the network to train due to its biases. The inverse transform before applying the positional encoding makes little difference.
  3. frac_perception_time is a parameter that we used to simulate operation with different compute budgets. Simply set this to 1 for normal operation.

Feel free to reopen if you have follow up questions.

Thank you so much for the reply!

I have one more question about the relation between GPU device and the frac_perception_time.

I was testing on 3090, and the average time for one step is around 25-30ms. In your paper, one step needs around 33ms, so I guess I need to set frac_perception_time to 0.75. However, the time step seems not change.

Also, will you consider publishing some numerical results so we could compare? It seems my results matches the figures in the paper, but I am not quite sure.

In our paper, we used a slightly older GPU and so it seems about right that your average step time is a bit lower. If you set the frac_perception_time to 0.75, the printed value will still remain the same, but it will be scaled when accumulated into the total step time (https://github.com/facebookresearch/iSDF/blob/main/isdf/modules/trainer.py#L994). This is what is used to model the
total perception time.

@joeaortiz I am also a bit confuse about the frac_perception_time. How do you get different evaluation points according to the frac_perception_time ? Also, if our step type is different from yours, will that influence a lot? I mean, can I just set frac_perception_time as 1 but use a different GPU?

Hi Jingyi, in the appendix we do an experiment to evaluate performance when the system can use x% of the real-time compute budget for perception. For example, if the planner needs 0.5s for trajectory optimisation every second then iSDF can use only 50% of the compute budget, so we set frac_perception_time = 0.5. We evaluate this by multiplying the step time by 2 = 1 / frac_perception_time when we add it to the total elapsed time. You can see in the appendix that we evaluate iSDF (and other methods) at 3 different values for frac_perception_time.

You will get slightly different results if you are using a different GPU so your step time is different, but shouldn't have a big effect.

In general, unless you are trying to reproduce our experiments in the appendix A (fig 9) always set frac_perception_time = 1!

Thank you so much for the reply!

I have one more question about the relation between GPU device and the frac_perception_time.

I was testing on 3090, and the average time for one step is around 25-30ms. In your paper, one step needs around 33ms, so I guess I need to set frac_perception_time to 0.75. However, the time step seems not change.

Also, will you consider publishing some numerical results so we could compare? It seems my results matches the figures in the paper, but I am not quite sure.

To clarify this misunderstanding, do no change frac_perception_time just because your GPU is faster than the one we used for evaluations! Your results will just be a bit better than ours as you can do more training steps per second.