Checkpoints?

Question

Checkpoints?

Opened this issue a year ago · 4 comments

Could you provide instructions how to get the ONNX versions of the encoder & decoder?

Answer 1 · 2023-07-16T10:56:20.000Z

In the official repository, only the code to export the decoder onnx format is included, but in this branch：https://github.com/visheratin/segment-anything/，the encoder part has been added.

you can use the script command blew to generate both encoder and decoder onnx format model.
python scripts/export_onnx_model.py --checkpoint <path/to/checkpoint> --model-type <model_type> --encoder-output<path/to/encoder output> --decoder-output<path/to/decoder output>

for echeckpoint parameter,
it is original pytorch format pretrained model.
default or vit_h: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
vit_l: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
vit_b: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth

for model-type parameter:
"In ['default', 'vit_h', 'vit_l', 'vit_b']. Which type of SAM model to export.",

Other parameters can be set according to your needs.

You can also directly download the converted model.
!wget https://huggingface.co/visheratin/segment-anything-vit-b/resolve/main/encoder-quant.onnx
!wget https://huggingface.co/visheratin/segment-anything-vit-b/resolve/main/decoder-quant.onnx

Answer 2 · 2023-07-16T12:04:29.000Z

Thanks! I also know of https://github.com/vietanhdev/samexporter which looks nice (and it's packaged). I just don't know if it does the same thing (I know that certain things were hardcoded in one of the onnx encoders, like image size).

Answer 3 · 2023-07-17T01:40:09.000Z

They all implement the ONNX format export function by calling the interface blew：

torch.onnx.export(model, args, f, export_params=True, verbose=False, training=False, input_names=None, output_names=None, dynamic_axes=None, opset_version=None, do_constant_folding=True, example_outputs=None, strip_doc_string=True, keep_initializers_as_inputs=None, propagate=None, use_external_data_format=None)

but they have slightly different processing of the parameters of the torch.onnx.export interface. For example, for dynamic_axes with dynamic parameters, the code of vietanhdev can make the image size input by the encoder dynamic
dynamic_axes = {
"input_image": {0: "image_height", 1: "image_width"}

and the code of visheratin sets the input image size to be fixed, while the batch size is dynamic. You can combine the two codes as a reference according to your needs

Answer 4 · 2023-07-18T13:55:06.000Z

Thanks for explanation!