Parsing non-standard sized documents
NicholasMcElroy opened this issue · 3 comments
Hello,
I've been using VILA for work with scientific publications and it works exceedingly well, and I was wondering if it would be possible to use it for documents that are non-standard sizes (i.e. research posters). Currently, when I attempt to parse a document like that, I get the following error:
Traceback (most recent call last):
File "/home/nick/.local/lib/python3.9/site-packages/transformers/models/layoutlm/modeling_layoutlm.py", line 105, in forward
left_position_embeddings = self.x_position_embeddings(bbox[:, :, 0])
File "/home/nick/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nick/.local/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(
File "/home/nick/.local/lib/python3.9/site-packages/torch/nn/functional.py", line 2044, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/nick/Desktop/tests/ext_tests/test2.py", line 48, in <module>
predicted_tokens = pdf_predictor.predict(pdf_data)
File "/home/nick/.local/lib/python3.9/site-packages/vila/predictors.py", line 72, in predict
model_outputs = self.model(**self.model_input_collator(model_inputs))
File "/home/nick/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nick/.local/lib/python3.9/site-packages/vila/models/hierarchical_model.py", line 263, in forward
outputs = self.hierarchical_model(
File "/home/nick/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nick/.local/lib/python3.9/site-packages/vila/models/hierarchical_model.py", line 223, in forward
embedded_lines = self.textline_model.embeddings(
File "/home/nick/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/nick/.local/lib/python3.9/site-packages/transformers/models/layoutlm/modeling_layoutlm.py", line 110, in forward
raise IndexError("The :obj:`bbox`coordinate values should be within 0-1000 range.") from e
IndexError: The :obj:`bbox`coordinate values should be within 0-1000 range.
I'm assuming that it has something to do with the dimensions of the document, but I wasn't completely sure. If there is any input that you can provide on potentially getting this to work I'd greatly appreciate it, thank you!
Thanks! In that case, I suggest you normalize all the token coordinates to 0-1000 manually as we don't do the token position normalization in the code right now.
You might want to check #16
Very cool, I had been normalizing the token coordinates like you had suggested and it's nice that it's a part of the library now. Thank you! Looking forward to seeing the retrained model.