paulchhuang/bstro

DEMO Error : can't run ref_vertices.expand(batch_size, -1, -1)

Closed this issue · 10 comments

Hi , I really liked your paper and wanted to try out the demo. I run the exact line from DEMO.md and got the following error:

Traceback (most recent call last):
  File "/home/oscar/Workspace/bstro/bstro/./metro/tools/demo_bstro.py", line 302, in <module>
    main(args)
  File "/home/oscar/Workspace/bstro/bstro/./metro/tools/demo_bstro.py", line 296, in main
    run_inference(args, _bstro_network, smpl, mesh_sampler)
  File "/home/oscar/Workspace/bstro/bstro/./metro/tools/demo_bstro.py", line 88, in run_inference
    _, _, pred_contact = BSTRO_model(images, smpl, mesh_sampler)
  File "/home/oscar/anaconda3/envs/bstro2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/oscar/Workspace/bstro/bstro/metro/modeling/bert/modeling_bstro.py", line 203, in forward
    ref_vertices = ref_vertices.expand(batch_size, -1, -1)
RuntimeError: The expanded size of the tensor (1) must match the existing size (30) at non-singleton dimension 0.  Target sizes: [1, -1, -1].  Tensor sizes: [30, 431, 3]

My setup:
Python 3.10.5
Pytorch 1.11.0
torchvision 0.12.0
cuda 11.3.1

Hi,

Unfortunately I didn't encounter the errors you have. Before line 203, the tensor ref_vertices's shape should be [1, 431, 3], which will be expand to [batch_size, 431, 3] by ref_vertices = ref_vertices.expand(batch_size, -1, -1).

Tensor sizes: [30, 431, 3] looks like batch_size=30 but the current demo code supports only batch_size=1, which is kinda perplexing.

Aligning the dependencies' versions may help.

Thanks!

Hi,
thanks for answering. I managed to make it run, forcing some dimensions to 1. There is a pseudo-batch_size = 30 all over the repository. When I set those to 1 everything worked well.

I have an other question, for the demo we run bstro with those arguments:

  • --num_hidden_layers 4
  • --num_attention_heads 4

It seems like for hidden_layers and attention_heads we can go up to 12. Is it possible to change those arguments and still run the demo? Which architecture gave the best results during your benchmarks?

Hi,
Having the same issue here. I tried changing the batch_size parameters as you said. still receiving the same error. Can you share you fork which is working?
Thanks

Hi,
For the "debatchification" look at the last two commits on my forked repo. Code is ugly but working for me, also I did not manage to display the outputs aside so I directly render them with trimesh.

https://github.com/oscarfossey/bstro

Hi,
a few quick reply below:

  1. I get a chance to test the whole installation on another clean-state machine from scratch, following the docs/INSTALL.md. The code still runs w/o getting this pseudo-batch_size = 30 error. My best guest now is this could be an env-dependent issue.
    Will try @oscarfossey's setting next time. Can @hosseinfeiz also share the setting as well?
    Before we have a generic solution, I'll put a FAQ and point to @oscarfossey's reply above. Is this ok?
  2. I didn't experiment with different --num_hidden_layers and --num_attention_heads. These parameters are the same as the setup in METRO.

Hi,

Okey for me, thanks for the precise answers.

and forgot one point:
3. contact_vis.obj is the final visualization generated by the code. This image is made by putting it in MeshLab, taking a screenshot and putting them side by side. I should be more clear in the instruction.

qinb commented

I also meet this problem. As for me, the reason is that the smpl shapedir's dimension is 300 instead of 10.
So the easy solution is changing self.shapedirs.view(-1,10) to self.shapedirs[:,:, :10].view(-1,10) in https://github.com/paulchhuang/bstro/blob/main/metro/modeling/_smpl.py#L74

Hi, I confirm I can reproduce the reported error using 300-shapedir smpl model, and @qinb's workaround solves the issue. In a nutshell, 300 after view(-1, 10) just leads to a virtual batch_size 30=300/10.

I was following the instructions in METRO and prepared the instructions in BSTRO in the same way. Didn't expect users to re-use their existing SMPL model files. If @hosseinfeiz @oscarfossey can confirm this addresses their issue, I can quickly push a hot fix and big thanks to @qinb for the pointer!

qinb commented

@paulchhuang Hi, Could you give some advice for your other repo? muelea/selfcontact#9
1、The speed is too slow when running run_selfcontact_optimization.py, excepting decreasing the parameter - maxiter, what has the other method?
2、ProHMR serves as a pose estimator, does the repo of selfcontact merge together with ProHMR for joint training? for examples, selfcontact serves as loss supervisor?
Thanks again