huggingface/transformers

Add TableTransformerImageProcessor

NielsRogge opened this issue · 3 comments

Feature request

The Table Transformer is a model with basically the same architecture as DETR.

Now, when people do this:

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("microsoft/table-transformer-detection")
print(type(processor))

this will print DetrImageProcessor.

However, Table Transformer has some specific image processing settings which aren't exactly the same as in DETR:

from torchvision import transforms

class MaxResize(object):
    def __init__(self, max_size=800):
        self.max_size = max_size

    def __call__(self, image):
        width, height = image.size
        current_max_size = max(width, height)
        scale = self.max_size / current_max_size
        resized_image = image.resize((int(round(scale*width)), int(round(scale*height))))
        
        return resized_image

# this is required for the table detection models
detection_transform = transforms.Compose([
    MaxResize(800),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# this is required for the table structure recognition models
structure_transform = transforms.Compose([
    MaxResize(1000),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Hence we could create a separate TableTransformerImageProcessor which replicates this.

Motivation

Would be great to 100% replicate original preprocessing settings

Your contribution

I could work on this but would be great if someone else can take this up

@NielsRogge ,
Will do that .

Great, see https://github.com/microsoft/table-transformer/blob/16d124f616109746b7785f03085100f1f6247575/src/inference.py#L39-L49 as there's a difference between the detection model and the structure recognition models

@NielsRogge just to reconfirm. we need to have a image_processing_table_transformer defining TableTransformerImageProcessor that has specific TableTransformer transform for structure/detect.

Any other specifics apart from that ? any other diff ? I will anyways try finding.