XT5un/P2NeRF

Error occurs in get_depth_rank_batch function

Closed this issue · 6 comments

Hi,

I am training my own dataset by P2NeRF, but the error occurs in the get_depth_rank_batch function.
When I run script file for training, I got this error:
스크린샷 2024-09-16 오후 2 24 23

The above error is caused by "idxs = np.where(rank_map == i)[0]" results zero value.
What is the problem? I utilize the dpt model for extracting depth and succeed to extract prior using loftr.
As you mentioned in the code, I also utilize a single GPU.

Thank you for great work!

XT5un commented

Based on your description, I'm guessing it could be due to a problem with monocular depth or calculating percentiles.

Could you check depth, vmin, vmax and th here in debug mode?

P2NeRF/internal/datasets.py

Lines 849 to 855 in 5631035

q = [i * 100.0 / rank_level for i in range(1, rank_level)]
vmin = np.percentile(depth, 1)
vmax = np.percentile(depth, 99)
th = np.percentile(depth[(depth > vmin) & (depth < vmax)], q)
rank_map = np.zeros(depth.shape, dtype=np.int32)
for i in range(len(th)):
rank_map[depth > th[i]] = i + 1

Yeah, I already check those values.

When I print vmin, vmax, th, and depth at the point where the error occurs, the results are as follows:
스크린샷 2024-09-17 오전 11 54 00
스크린샷 2024-09-17 오전 11 54 19
스크린샷 2024-09-17 오전 11 54 33

The last one is the corresponding dpt depth. I don't know what the problem is.

This error occurs rarely and once in a while, and even it works normally in a subset of my datasets. So, would it be okay to skip by using a conditional statement when the error occurs?

XT5un commented

The problem should be due to uneven depth distribution, the first few values of th are all the same, which causes the later groupings to overwrite the earlier groupings when assigning values to rank_map.

I think there can be two solutions to this, one is that you can try an uneven grouping strategy, and the other is to check when assigning values to rank_map and only assign up to len(depth) // rank_level elements at a time.

Here's an example code I modified, I didn't test it as I don't have test data, but the basic idea is to control the number of elements assigned at a time.

  # calculate rank map
  q = [i * 100.0 / rank_level for i in range(1, rank_level)]
  vmin = np.percentile(depth, 1)
  vmax = np.percentile(depth, 99)
  th = np.percentile(depth[(depth > vmin) & (depth < vmax)], q)
  #######! NEW
  rank_map = np.zeros(depth.shape, dtype=np.int32) - 1  # -1 for unassigned
  num_per_rank = len(depth) // rank_level
  # assign 0~rank_level-1 group
  for i in range(len(th)):
    mask = (depth <= th[i]) & (rank_map == -1)
    if mask.sum() > num_per_rank:
      mask_idxs = np.where(mask)[0]
      sample_mask_idxs = np.random.choice(mask_idxs, num_per_rank, replace=False)
      rank_map[sample_mask_idxs] = i
    else:
      rank_map[mask] = i
  # assign the rest
  rank_map[rank_map == -1] = rank_level - 1
  #######! NEW

By the way, from your depth map visualization, it seems that the depth at the window takes up a larger percentage of this image, and the depth at the window should be the larger value. But the th variable responds to the situation where the smaller depths take up a larger percentage. The depth our code is receiving is not the inverse depth of the DPT output, you may need to invert the inverse depth or modify the code.

Thanks a lot! The provided code works fine!

As you mentioned, I am extracting depth again, which model do you use for extracting depth? Could you please give me information about that (e.g., dpt_hybrid, dpt_large, dpt_swin ...)?

Also, the vanilla dpt model predicts inverse depth, not depth. I think if we invert the inverse depth, we should know about the scale and shift to extract accurate depth. Could you provide a detailed description for depth extraction? How can I invert the inverse depth or modify the code that is suitable for P2NeRF?

XT5un commented

We illustrate this part in the supplementary material, the model is DPT_Hybrid, and normalizes the predicted depth using the maximum and minimum values ​​of the predicted depth and then inverts it.

depth = 1 - ((depth - depth.min()) / (depth.max() - depth.min()))

In fact, normalization is not necessary. Just inversion is also OK. Normalization is for convenience of visualization.

Thanks a lot!