DocOrNot is an image classification model that detect if an image is likely to be a text document or not. The model is trained on a dataset that is composed of images of text documents and images that are not.
The DocOrNot dataset was built using:
- RVL CDIP (Small) - https://www.kaggle.com/datasets/uditamin/rvl-cdip-small
- Flickr8k - https://www.kaggle.com/datasets/adityajn105/flickr8k
8k images were taken from the 8k Flickr dataset and from the RVL CDIP one.
You can get it here https://huggingface.co/datasets/tarekziade/docornot
The model was fine-tuned using facebook/deit-base-distilled-patch16-224