shabie/docformer

AssertionError:

deepanshudashora opened this issue · 12 comments

`AssertionError                            Traceback (most recent call last)
<ipython-input-8-02f52eee118a> in <module>()
     25 
     26 tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
---> 27 encoding = dataset.create_features(fp, tokenizer)
     28 
     29 feature_extractor = modeling.ExtractFeatures(config)

/content/docformer/src/docformer/dataset.py in create_features(image, tokenizer, add_batch_dim, target_size, max_seq_length, path_to_save, save_to_disk, apply_mask_for_mlm, extras_for_debugging)
    259         "y_features": torch.as_tensor(a_rel_y, dtype=torch.int32),
    260         })
--> 261     assert torch.lt(encoding["x_features"], 0).sum().item() == 0
    262     assert torch.lt(encoding["y_features"], 0).sum().item() == 0
    263 

AssertionError: 

First I tried with png image, later converted to tif but still it is giving this error

Got it, that bug needs to be removed, and for that, we just need to remove the assertion statement and that's it. Maybe, @shabie can describe why he had added that statement, but if you remove it, you would get everything working. Can you clone the repo, and then remove the statement to see if that works or not. Meanwhile, in sometime, we would try to fix that error.

Hope, it helps

Got it, that bug needs to be removed, and for that, we just need to remove the assertion statement and that's it. Maybe, @shabie can describe why he had added that statement, but if you remove it, you would get everything working. Can you clone the repo, and then remove the statement to see if that works or not. Meanwhile, in sometime, we would try to fix that error.

Hope, it helps

Traceback (most recent call last):
  File "test.py", line 31, in <module>
    v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deepanshu/Desktop/docformer/src/docformer/modeling.py", line 467, in forward
    v_bar_s, t_bar_s = self.spatial_feature(x_feature, y_feature)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deepanshu/Desktop/docformer/src/docformer/modeling.py", line 201, in forward
    x_calculated_embedding_v[..., i * sub_dim: (i + 1) * sub_dim] = self.x_embedding_v[i](temp_x)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/functional.py", line 2044, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Now i am facing this issue, regarding feature extraction from document

The reason is as follows:
Problem: The embedded layer does not take input as a negative integer.

In dataset.py, in the get_relative_distance, earlier it used to work because in a_rel_x.append, we used to add the abs value, however, in my last commit, I removed it, because, in transformers like LayoutLMV2, they used to add a bias rather than adding taking a bias. So, the error which you got can be removed, if you just take the absolute value (or rather add a bias, which I would be adding shortly).

The implementation is like this:
a_rel_x.append( [ curr[0], # top left x curr[2], # bottom right x abs(curr[2] - curr[0]), # width abs(curr[0] - prev[0]), # diff top left x abs(curr[0] - prev[0]), # diff bottom left x abs(curr[2] - prev[2]), # diff top right x abs(curr[2] - prev[2]), # diff bottom right x abs(centroids[i][0] - centroids[i - 1][0]) ]

The bias value, I would be adding shortly. I hope you understand what I am trying to say.

The reason is as follows: Problem: The embedded layer does not take input as a negative integer.

In dataset.py, in the get_relative_distance, earlier it used to work because in a_rel_x.append, we used to add the abs value, however, in my last commit, I removed it, because, in transformers like LayoutLMV2, they used to add a bias rather than adding taking a bias. So, the error which you got can be removed, if you just take the absolute value (or rather add a bias, which I would be adding shortly).

The implementation is like this: a_rel_x.append( [ curr[0], # top left x curr[2], # bottom right x abs(curr[2] - curr[0]), # width abs(curr[0] - prev[0]), # diff top left x abs(curr[0] - prev[0]), # diff bottom left x abs(curr[2] - prev[2]), # diff top right x abs(curr[2] - prev[2]), # diff bottom right x abs(centroids[i][0] - centroids[i - 1][0]) ]

The bias value, I would be adding shortly. I hope you understand what I am trying to say.

After adding the abs in a_rel_x and a_rel_y, I am getting another error now,

Traceback (most recent call last):
  File "test.py", line 31, in <module>
    v_bar, t_bar, v_bar_s, t_bar_s = feature_extractor(encoding)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deepanshu/Desktop/docformer/src/docformer/modeling.py", line 467, in forward
    v_bar_s, t_bar_s = self.spatial_feature(x_feature, y_feature)
  File "/home/deepanshu/anaconda3/envs/layoutlmv2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deepanshu/Desktop/docformer/src/docformer/modeling.py", line 205, in forward
    v_bar_s = x_calculated_embedding_v + y_calculated_embedding_v + self.position_embeddings_v()
RuntimeError: The size of tensor a (511) must match the size of tensor b (512) at non-singleton dimension 1

In the same function, before the return a_rel_x,a_rel_y, this is the statement which should be added,
a_rel_x.append([0]*8)
a_rel_y.append([0]*8)
The reason for the shape mismatch is, for the last work (i.e512th word), there is no forward bounding box

All this was working fine, before the changes, but I changed a bit, and this is creating some bugs :)

In the same function, before the return a_rel_x,a_rel_y, this is the statement which should be added, a_rel_x.append([0]*8) a_rel_y.append([0]*8) The reason for the shape mismatch is, for the last work (i.e512th word), there is no forward bounding box

I got the output as tensor, is there any way I can visualize the result
right now i am getting this

tensor([[[ 18.4928, -25.6611,  21.7048,  ...,  35.3260,  26.1802,   1.4976],
         [  8.5524,  47.2246, -12.3269,  ...,  24.4158,  16.2197,   4.3333],
         [ 18.9814,  32.8647, -10.0837,  ..., -12.6006,  50.7735,  14.3890],
         ...,
         [  9.5308, -13.2916,  26.7846,  ...,  23.3564,  29.7507,  25.3716],
         [ 31.0107, -28.4132,  38.5171,  ...,   7.2900,  29.6359,  26.2114],
         [ 19.7608, -54.3426,  25.7954,  ...,  21.5742,  28.2972,  26.2053]]],
       grad_fn=<AddBackward0>)

Awesome. so for visualization, you can take the resized_scaled_image, and convert ToPILImage function of PyTorch to convert it into an image. And then, extract the bounding box for each image (you can get it from create_features) and plot it by denormalizing the bounding box.

Do let me know, if you need more help :)

Awesome. so for visualization, you can take the resized_scaled_image, and convert ToPILImage function of PyTorch to convert it into an image. And then, extract the bounding box for each image (you can get it from create_features) and plot it by denormalizing the bounding box.

Do let me know if you need more help :)

I tried but i am getting some noise instead of an image, can you give me the code

Having extracted the image from the create_features, follow this:
image = encoding['resized_scaled_img'] # encoding is the dictinary, returned from the create_features
final_img = torchvision.transforms.ToPILImage()(image)

  • And then plot the bounding box as normal (no need to normalize or denormalize it)
    Hope it helps

@deepanshudashora Should we close this issue?

Maybe, if there is no problem, we would be closing this issue. Thanks,