Simple Finetuner for Segment Anything

This repository contains a simple starter code for finetuning the FAIR Segment Anything (SAM) models leveraging the convenience of PyTorch Lightning.

Setup

Install dependencies

First run

git clone --recurse-submodules git@github.com:bhpfelix/segment-anything-finetuner.git

Then

cd segment-anything-finetuner

Follow the setup instruction of Segment Anything to install the proper dependencies. Then run

pip install -r requirements.txt

Data preparation

The starter code supports Coco format input with the following layout

├── dataset_name/
│   ├── train/
│   │   ├── _annotations.coco.json # COCO format annotation
│   │   ├── 000001.png             # Images
│   │   ├── 000002.png
│   │   ├── ...
│   ├── val/
│   │   ├── _annotations.coco.json # COCO format annotation
│   │   ├── xxxxxx.png             # Images
│   │   ├── ...

Download model checkpoints

Download the necessary SAM model checkpoints and arrange the repo as follows:

├── dataset_name/              # structure as detailed above
│   ├── ...
├── segment-anything/          # The FAIR SAM repo
│   ├── ...
├── SAM/                       # the SAM pretrained checkpoints
│   ├── sam_vit_h_4b8939.pth
│   ├── ...
├── finetune.py
├── ...

Finetuning (`finetune.py`)

This file contains a simple finetuning script for the Segment Anything model on Coco format datasets.

Example usage:

python finetune.py \
    --data_root ./dataset_name \
    --model_type vit_h \
    --checkpoint_path ./SAM/sam_vit_h_4b8939.pth \
    --freeze_image_encoder \
    --batch_size 2 \
    --image_size 1024 \
    --steps 1500 \
    --learning_rate 1.e-5 \
    --weight_decay 0.01

We can optionally use the --freeze_image_encoder flag to detach the image encoder parameters from optimization and save GPU memory.

Notes

As of now the image resizing implementation is different from the ResizeLongestSide transform in SAM.
Drop path and layer-wise learning rate decay are not currently applied.
The finetuning script currently only supports bounding box input prompts.

Resources

Citation

@article{kirillov2023segany,
  title={Segment Anything},
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}