Finetuned ViLT model for image scenic classification with comments.
Finetune the model using labeled images. Requires big memory allocation and may crash according to dataset increase.
Preprocess and stores encodings to local storage. Requires big storage size.
Solved memory increase and storage requirement. Stable version.