/auto_labelling_with_vlms

Repo to obtain output from PaliGemma for object detection tasks and using the predictions as labels

Primary LanguageJupyter NotebookMIT LicenseMIT

auto_labelling_with_vlms

VLMS

Repo to obtain outputs from PaliGemma a Visual Language Model for object detection tasks and using the predictions as labels, visualized through VIA tool by VGG group.

Steps:

  1. Install dependencies pip3 install -r requirements.txt
  2. Get token from Hugging Face and set as env variable os.environ["HUGGINGFACE_API_TOKEN"] = "<enter the token here>"
  3. Put the images for which labels are needed in the images folder
  4. Execute python3 auto_labelling_paligemma.py
  5. Download the VIA tool : https://www.robots.ox.ac.uk/~vgg/software/via/downloads/via-2.0.12.zip Screenshot 2024-09-30 at 10 54 14 PM
  6. Click on via.html and upload the annotations generated Screenshot 2024-09-30 at 10 50 43 PM
  7. Make sure to have the images folder inside the via folder as shown in step 4
  8. Adjust/Add/Delete the annotations based on the need Screenshot 2024-09-30 at 11 43 25 PM

References:
[1] https://github.com/NSTiwari/PaliGemma
[2] https://huggingface.co/docs/transformers/main/en/model_doc/paligemma
[3] https://www.kaggle.com/datasets/kylegraupe/wind-turbine-image-dataset-for-computer-vision
[4] https://www.robots.ox.ac.uk/~vgg/software/via/

Google Cloud credits are provided for this project #AISprint

Cite This Work

If you use this project in your research, please cite it using the following BibTeX entry:

@misc{Bhat2024,
  author       = {Rajesh Shreedhar Bhat},
  title        = {Auto Labelling with Vision-Language Models},
  year         = {2024},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/rajesh-bhat/auto_labelling_with_vlms}},
}