In this project, we use the Segment Anything model released by Meta to capture masks of product packshots. This is challenging because some images can have shadows, reflections or even logos that needs to be taken into account.
I chose the vitb image encoder. I applied LoRA to the attention modules inside the image encoder. I focused on queries and values as the LoRA (paper suggest that it is better). I used bounding boxes for the input prompts.
Get the repo with:
git clone https://github.com/MathieuNlp/Sam_LoRA.git
A gradio demo available. You can load your image and place 2 points to form a boudning box. After that run the generation of the mask.
demo.ipynb
There is a config file listing the hyperparameters to tune the model and some paths.
config.yaml
All the dependecies are managed with poetry.
cd sam_lora_poetry
poetry install
If there is an error with Pytorch, Safetensors or CV2, do:
poetry run pip install opencv-python safetensors torch==1.12.1+cu116 torchvision==0.13.1+cu116 -f https://download.pytorch.org/whl/torch_stable.html
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
The training is not using the SamPredictor from Meta because I would like to be able to handle batches. So i created a processor.py file in /src that processes the images and prompts accordingly. The saved weights are in lora.safetensors.
poetry run python train.py
Run an inference with the saved weights from the training.
poetry run python inference.py
The plots folder regroup some comparaison and results to visualize the results.
- comparaison.png: Plot during the training the ground truth mask on top and predicted masks on the bottom.
- gt_mask.jpg: Ground truth mask example.
- perfume2_notraining.jpg: Perfume 2 mask predicted by the model with no training.
- perfume2.jpg: Perfume 2 mask predicted by the model trained with 10 epochs.
- pred_perfume2_no_training.jpg: Original image and predicted mask visualisation
Thank you to:
- HuggingFace: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Fine_tune_SAM_(segment_anything)_on_a_custom_dataset.ipynb
- JamesQFreeman: https://github.com/JamesQFreeman/Sam_LoRA
Mathieu Nalpon