AIDajiangtang/Segment-Anything-CPP

Checkpoints?

Opened this issue · 4 comments

Could you provide instructions how to get the ONNX versions of the encoder & decoder?

In the official repository, only the code to export the decoder onnx format is included, but in this branch:https://github.com/visheratin/segment-anything/,the encoder part has been added.

you can use the script command blew to generate both encoder and decoder onnx format model.
python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type <model_type> --encoder-output<path/to/encoder output> --decoder-output<path/to/decoder output>

for echeckpoint parameter,
it is original pytorch format pretrained model.
default or vit_h: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
vit_l: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
vit_b: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

for model-type parameter:
"In ['default', 'vit_h', 'vit_l', 'vit_b']. Which type of SAM model to export.",

Other parameters can be set according to your needs.

You can also directly download the converted model.
!wget https://huggingface.co/visheratin/segment-anything-vit-b/resolve/main/encoder-quant.onnx
!wget https://huggingface.co/visheratin/segment-anything-vit-b/resolve/main/decoder-quant.onnx

Thanks! I also know of https://github.com/vietanhdev/samexporter which looks nice (and it's packaged). I just don't know if it does the same thing (I know that certain things were hardcoded in one of the onnx encoders, like image size).

They all implement the ONNX format export function by calling the interface blew:

torch.onnx.export(model, args, f, export_params=True, verbose=False, training=False, input_names=None, output_names=None, dynamic_axes=None, opset_version=None, do_constant_folding=True, example_outputs=None, strip_doc_string=True, keep_initializers_as_inputs=None, propagate=None, use_external_data_format=None)

but they have slightly different processing of the parameters of the torch.onnx.export interface. For example, for dynamic_axes with dynamic parameters, the code of vietanhdev can make the image size input by the encoder dynamic
dynamic_axes = {
"input_image": {0: "image_height", 1: "image_width"}

and the code of visheratin sets the input image size to be fixed, while the batch size is dynamic. You can combine the two codes as a reference according to your needs

Thanks for explanation!