grounding_sam: A Jupyter Notebook repository from iuhiah

Models

This repository contains code which uses the following models:

Whisper and CLIPSeg can be installed via their respective repositories. Grounded-SAM and Panoptic-SAM require the installation of 2 other individual models, which are linked below for easier reference.

For the official demos, please visit the individual repositories.

Scripts

The scripts stored in Jupyter Notebooks are usually used for visualisation. The main functions in the Python files can be edited to accept arguments from the command line, which can make batch processing faster.

Note: fromaudio.ipynb is the only file that includes Whisper in the pipeline.

Code for the Panoptic-SAM pipeline has remained largely untouched, with the exception of providing a set of fixed prompts to identify "things" (via Grounding-DINO) and "stuff" (via CLIPSeg). These prompts may be edited for more desirable results.

Potential Improvements/Expansion

Introduce auto captioning to generate prompts
- Possible model to look into: Tag2Text/RAM
Post-processing of results to improve accuracy and scope of detection
Link results to LiDAR data

Known Limitations

"Roads"/"Car parks" are not identified well
Less common objects (e.g. metal ventilation doors) are not identified well
Relative object detection is not accurate
Sparsity of trees affects whether CLIPSeg or Grounding-DINO provides better results
Detection sensitive to lighting and orientation of object

iuhiah/grounding_sam

Models

Scripts

Potential Improvements/Expansion

Known Limitations