/grounding_sam

Primary LanguageJupyter Notebook

Models

This repository contains code which uses the following models:

Whisper and CLIPSeg can be installed via their respective repositories. Grounded-SAM and Panoptic-SAM require the installation of 2 other individual models, which are linked below for easier reference.

For the official demos, please visit the individual repositories.

Scripts

The scripts stored in Jupyter Notebooks are usually used for visualisation. The main functions in the Python files can be edited to accept arguments from the command line, which can make batch processing faster.

Note: fromaudio.ipynb is the only file that includes Whisper in the pipeline.

Code for the Panoptic-SAM pipeline has remained largely untouched, with the exception of providing a set of fixed prompts to identify "things" (via Grounding-DINO) and "stuff" (via CLIPSeg). These prompts may be edited for more desirable results.

Potential Improvements/Expansion

  • Introduce auto captioning to generate prompts
  • Post-processing of results to improve accuracy and scope of detection
  • Link results to LiDAR data

Known Limitations

  • "Roads"/"Car parks" are not identified well
  • Less common objects (e.g. metal ventilation doors) are not identified well
  • Relative object detection is not accurate
  • Sparsity of trees affects whether CLIPSeg or Grounding-DINO provides better results
  • Detection sensitive to lighting and orientation of object