This project combines OSTrack and SAM to do video object segmentation.
OSTrack:https://github.com/botaoye/OSTrack
SAM:https://github.com/facebookresearch/segment-anything
In this project, we interact with SAM (box or point) to obtain the mask of the first frame, and then use the mask to obtain the corresponding box as the initial box for tracking.
Using OSTrack, we can obtain the tracking boxes for subsequent frames, and based on these boxes, we can further obtain mask results through SAM to achieve the purpose of VOS.
The pipeline is as follows:
conda create -n ostrack python=3.8
conda activate ostrack
bash install.sh
python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output
Download pre-trained weights for OSTrack and SAM, for example, OSTrack_ep0300.pth.tar
and sam_vit_h_4b8939.pth
.
Put OSTrack_ep0300.pth.tar
in ./output/checkpoints/train/ostrack/vitb_384_mae_ce_32x4_ep300
.
Set the path of images in ./tracking/samdemo.py
.
Set the path of SAM weight in ./lib/test/evaluation/tracker.py
.
python ./tracking/samdemo.py ostrack
This project has two modes to get the initial box, input b
for box mode, p
for point mode.
In box mode, you can draw a rectangle as the initial box, and press Enter
to start video object segmentation.
In point mode, you can click on the image to set points and get the initial box, there are two kinds of points (You can read SAM project for more details):
- Positive points: left mouse button.
- Negitive points: ctrl + left mouse button.
Press r
to reset all points and press f
to start video object segmentation.
The results should be as follows (gt, img, pred):