auto_labelling_with_vlms

Repo to obtain outputs from PaliGemma a Visual Language Model for object detection tasks and using the predictions as labels, visualized through VIA tool by VGG group.

Steps:

Install dependencies pip3 install -r requirements.txt
Get token from Hugging Face and set as env variable os.environ["HUGGINGFACE_API_TOKEN"] = "<enter the token here>"
Put the images for which labels are needed in the images folder
Execute python3 auto_labelling_paligemma.py
Download the VIA tool : https://www.robots.ox.ac.uk/~vgg/software/via/downloads/via-2.0.12.zip
Click on via.html and upload the annotations generated
Make sure to have the images folder inside the via folder as shown in step 4
Adjust/Add/Delete the annotations based on the need

References:
[1] https://github.com/NSTiwari/PaliGemma
[2] https://huggingface.co/docs/transformers/main/en/model_doc/paligemma
[3] https://www.kaggle.com/datasets/kylegraupe/wind-turbine-image-dataset-for-computer-vision
[4] https://www.robots.ox.ac.uk/~vgg/software/via/

Google Cloud credits are provided for this project #AISprint

Cite This Work

If you use this project in your research, please cite it using the following BibTeX entry:

@misc{Bhat2024,
  author       = {Rajesh Shreedhar Bhat},
  title        = {Auto Labelling with Vision-Language Models},
  year         = {2024},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/rajesh-bhat/auto_labelling_with_vlms}},
}