reproduce evaluation results
waterljwant opened this issue · 3 comments
Hi,
Thank you for the great open-source work.
However, I am currently facing some difficulties in reproducing the evaluation results, particularly regarding the scene classification on NYU-D and SUN-D. I have attached the results I obtained after executing the provided script.
Could you please assist me in identifying any possible steps or details that I might have missed, leading to this inconsistency in accuracy?
Hi,
To reproduce results on NYU-D and SUN-D,
- Please follow the instruction for inference: download vitlensL-depth checkpoint.
cd vitlens/ # you may change the path accordingly torchrun --nproc_per_node=1 ./src/training/depth/depth_tri_main.py \ --cache_dir /path_to/cache \ --val-data sun-rgbd::nyu-depth-v2-val1::nyu-depth-v2-val2 \ --visual_modality_type depth --dataset-type depth --v_key depth \ --n_tower 3 \ --use_perceiver --perceiver_cross_dim_head 64 --perceiver_latent_dim 1024 --perceiver_latent_dim_head 64 --perceiver_latent_heads 16 \ --perceiver_num_latents 256 --perceiver_as_identity \ --use_visual_adapter \ --batch-size 64 \ --lock-image --lock-text --lock-visual --unlock-trans-first-n-layers 4 \ --model ViT-L-14 --pretrained datacomp_xl_s13b_b90k \ --name depth/inference_vitlensL_perf \ --resume /path_to/vitlensL_depth.pt
- We follow ImageBind for data preprocessing (convert to disparity), please also make sure you use the same operation. See here. I also uploaded a copy here.
If you still cannot reproduce the results (Table 5 in the paper), you may provide your env setup so that i can look into this.
@StanLei52 Thank you! I have found that I mistakenly used different depth data. After adjusting according to this code depth_dir = os.path.join(path, "depth_bfx")
, the accuracy is consistent.