VICO-UoE/Looking3D

Issue with inferencing

Closed this issue · 1 comments

First of all thanks for the repo for this Looking3D and nice work. I am trying the code for the inferencing part and notice some errors when following the README.md, especially during the inferencing step.

I already setup the sample folder as follows:

Looking3D
└───sample
│       │   query_example.png       # one of the images from the dataset with the shape_id 179
│       │   query_example_2.png   # one of the images from the dataset with the shape_id 179
│       │   query_example_3.png   # one of the images from the dataset with the shape_id 179
└───mv_images
│       │  179_3.0_0_20.json
│       │  179_3.0_0_20.npy
│       │  179_3.0_0_20.png
│       │  179_3.0_18_20.json
│       │  179_3.0_18_20.npy
│       │  179_3.0_18_20.png
│       │   ...

I wrote another inferencing Python script as instructed in the README.md by providing the stated query_path, the mv_path, the resume_ckpt, device and topk

when I run this script it shows this error:

(Looking3D) C:\Users\MyPC\Looking3D>python predict.py
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
constructing SpatialTransformer of depth 3 w/ 128 channels and 8 heads
WARNING: SpatialTransformer: Found context dims [256] of depth 1, which does not match the specified 'depth' of 3. Setting context_dim to [256, 256, 256] now.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is None and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is 256 and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is None and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is 256 and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is None and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is 256 and using 8 heads with a dimension of 64.
model loaded successfully
Traceback (most recent call last):
  File "predict.py", line 3, in <module>
    pred_labels = predict(query_path = r"C:\Users\MyPC\Looking3D\sample\query", \
  File "C:\Users\MyPC\Looking3D\demo.py", line 83, in predict
    result = forward_cmt(batch, models, is_train = False, topk = topk)
  File "C:\Users\MyPC\Looking3D\train.py", line 465, in forward_cmt
    imgs, mesh, labels, bbox, pos_enc3d = batch['query_imgs'], batch['mesh_images'], batch['labels'], batch['bbox'], batch['mesh_pos_enc3d']
KeyError: 'query_imgs'

So upon digging the train.py line 465 in forward_cmt function, the batch dictionary has different keys compared to what was implemented (no 'query_imgs', 'mesh_images' and 'mesh_pos_enc3d').

Checking the batch keys shows that the right keys are as follows:
'query_imgs' --> 'imgs'
'mesh_imgs' --> 'mesh'
'mesh_pos_enc3d' --> 'pos_enc3d'

Upon changing the dict keys the inference seems to be working although there are some wrong predictions (I suppose) as follows:

(Looking3D) C:\Users\MyPC\Looking3D>python predict.py
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
constructing SpatialTransformer of depth 3 w/ 128 channels and 8 heads
WARNING: SpatialTransformer: Found context dims [256] of depth 1, which does not match the specified 'depth' of 3. Setting context_dim to [256, 256, 256] now.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is None and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is 256 and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is None and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is 256 and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is None and using 8 heads with a dimension of 64.
Setting up MemoryEfficientCrossAttention. Query dim is 512, context_dim is 256 and using 8 heads with a dimension of 64.
model loaded successfully
-> Query_path : C:\Users\MyPC\Looking3D\sample\query\query_example.png
 Anoamly_pred_label : 0
Conf_score : 99.99982126805662
-> Query_path : C:\Users\MyPC\Looking3D\sample\query\query_example_2.png
 Anoamly_pred_label : 1
Conf_score : 86.26554012298584
-> Query_path : C:\Users\MyPC\Looking3D\sample\query\query_example_3.png
 Anoamly_pred_label : 0
Conf_score : 99.99998801540357

I have not retrained it from scratch nor have I trained it on my own dataset. I will update if there are any more errors. Thanks in advance.

As the issue has been active for a long time, I am closing it for now. Please reopen it if required.