/Streaming-Grounded-SAM-2

Grounded Tracking for Streaming Videos

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Streaming Grounded SAM 2

Grounded SAM 2 for streaming video tracking using natural language queries.

Demo

Framework

This system is comprised of three components:

  • LLM: This module is responsible for parsing the input query or inferring the intended object.
  • GroundingDINO: This component handles object referencing.
  • SAM-2: This part specializes in object tracking.

Getting Started

Installation

  1. Prepare environments
conda create -n sam2 python=3.10 -y
conda activate sam2
pip install -e .
  1. Download SAM 2 checkpoints
cd checkpoints
./download_ckpts.sh
  1. Download Grounding DINO checkpoints
cd gdino_checkpoints
./download_ckpts.sh

or huggingface version (recommend)

cd gdino_checkpoints
huggingface-cli download IDEA-Research/grounding-dino-tiny --local-dir grounding-dino-tiny
  1. Download LLM

4.1 GPT4-o (recommend)

cd llm
touch .env

past your API_KEY or API_BASE (Azure only)

API_KEY="xxx"
API_BASE = "xxx"

4.2 Qwen2

cd llm_checkpoints
huggingface-cli download Qwen/Qwen2-7B-Instruct-AWQ --local-dir Qwen2-7B-Instruct-AWQ

install the corresponding packages

run demo

Step-1: Check available camera

python cam_detect.py

If a camera is detected, modify it in demo.py.

Step-2: run demo

currently available model: Qwen2-7B-Instruct-AWQ, gpt-4o-2024-05-13

python demo.py --model gpt-4o-2024-05-13

Acknowledge: