segment-anything-fine-tune

SAM is a powerful image semantic segmentation model designed to accurately predict pixel-level masks for a wide range objects within an image. It consists of three parts:

Image encoder

a heavy vision transformer backbone that generates image features.

Prompt encoder

a lightweight embedding module that creates sparse and dense embeddings out of prompt inputs (points, boxes and or masks). Mask

decoder

a decoder that takes outputs of the image encoder and the prompt encoder to produce masks.

Image is passed through the image encoder, then its latent features are combined with prompt features before feeding into the mask decoder. Finally, output masks are generated.

Custom fine-tuned SAM output for Road segmentation

SAM paper: https://arxiv.org/pdf/2304.02643.pdf

Link to the dataset used in this demonstration: https://www.kaggle.com/insaff/massachusetts-roads-dataset

rickragv/-segment-anything-fine-tune

segment-anything-fine-tune

Image encoder

Prompt encoder

decoder

Custom fine-tuned SAM output for Road segmentation