reproduce evaluation results

Question

reproduce evaluation results

waterljwant opened this issue 9 months ago · 3 comments

Hi,
Thank you for the great open-source work.

However, I am currently facing some difficulties in reproducing the evaluation results, particularly regarding the scene classification on NYU-D and SUN-D. I have attached the results I obtained after executing the provided script.
Could you please assist me in identifying any possible steps or details that I might have missed, leading to this inconsistency in accuracy?

Answer 1 · 2024-03-31T12:55:24.000Z

Hi,

To reproduce results on NYU-D and SUN-D,

Please follow the instruction for inference: download vitlensL-depth checkpoint.

cd vitlens/
# you may change the path accordingly
torchrun --nproc_per_node=1 ./src/training/depth/depth_tri_main.py \
  --cache_dir /path_to/cache \
  --val-data sun-rgbd::nyu-depth-v2-val1::nyu-depth-v2-val2 \
  --visual_modality_type depth --dataset-type depth --v_key depth \
  --n_tower 3 \
  --use_perceiver  --perceiver_cross_dim_head 64 --perceiver_latent_dim 1024 --perceiver_latent_dim_head 64 --perceiver_latent_heads 16 \
  --perceiver_num_latents 256 --perceiver_as_identity \
  --use_visual_adapter \
  --batch-size 64 \
  --lock-image --lock-text --lock-visual --unlock-trans-first-n-layers 4 \
  --model ViT-L-14 --pretrained datacomp_xl_s13b_b90k \
  --name depth/inference_vitlensL_perf \
  --resume /path_to/vitlensL_depth.pt

We follow ImageBind for data preprocessing (convert to disparity), please also make sure you use the same operation. See here. I also uploaded a copy here.

If you still cannot reproduce the results (Table 5 in the paper), you may provide your env setup so that i can look into this.

Answer 2 · 2024-03-31T14:14:14.000Z

Btw, results from my side following the installation setup (pytorch==1.11.0), for your reference.

Answer 3 · 2024-04-03T07:36:30.000Z

@StanLei52 Thank you! I have found that I mistakenly used different depth data. After adjusting according to this code depth_dir = os.path.join(path, "depth_bfx"), the accuracy is consistent.