georgian-io/Multimodal-Toolkit

Preparing data for inferencing

Closed this issue · 5 comments

Hi there,

I have been using the load_data function to create my datasets for HuggingFace's Trainer class, which has worked really well so far. However, I am not sure how I should approach inferencing without a label column which load_data requires. Would really appreciate some advice as I am rather new to this. Thank you!

Hi @insdrs480, for creating datasets we expect labeled data since it follows a train-eval-test loop. If you don't have labels, you can assign dummy labels to that particular column and discard it later. Please let me know if you need any other help!

Thank you for the response! Are we able to specify the model's behaviour when it encounters unknown categorical data?

I'm getting the following error, which I believe is due to my test dataset containing values that are different from what I trained the model on: RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x22 and 28x768)

Not at the moment unfortunately. This is something we'd like to add in though. Do you have any suggestions as to what you would like to see?

Hi folks, I had the same issue and I suggested a solution in #61 to fix it in the load_data() function.

Thanks Doug!

I'm getting the following error, which I believe is due to my test dataset containing values that are different from what I trained the model on: RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x22 and 28x768)

@insdrs480 Apologies I missed this bit! The model will expect the same set of columns that were used for training. You might be getting this error if they don't match. I would advice either setting up default values for missing columns or removing them from the training process entirely.

Closing this issue as Doug's fix solves this. Feel free to re-open/create a new one if you run into any issues!