This package simplifies SentenceTransformer model optimization using onnx/optimum and maintains the easy inference with SentenceTransformer's model.encode
. Model optimization can lead up to 40% lower inference latency on CPU.
If your production code uses SentenceTransformer's model.encode
, this package enables easy integration of optimized models with minimal code changes.
Requires Python 3.8+
pip install optim_sentence_transformers
git clone github.com/sidhantls/optim-sentence-transformers;
cd optim-sentence-transformers;
pip install -e .;
Supported optimizations: "onnx" and "graph_optim" (graph optimization)
from sentence_transformers import SentenceTransformer
from optim_sentence_transformers import SentenceTransformerOptim, optimize_model
model = SentenceTransformer('sentence-transformers/all-distilroberta-v1')
# train if required and save
model.save('trained_model')
model_name_or_path = 'trained_model'
# model_name_or_path = 'sentence-transformers/all-distilroberta-v1' # to optimize default model
# optimize model
save_dir = 'onnx'
optimize_model(model_name_or_path = model_name_or_path,
pooling_model=None,
save_dir=save_dir,
optimize_mode='onnx'
)
# load optimized model
optim_model = SentenceTransformerOptim(save_dir)
optim_model.encode(['text'], normalize_embeddings=True)
In some cases model.encode in sentence-transformers will always return normalized vectors due to normalization layer during init. Here, if vectors are required to be normalized, set normalize_embeddings=True.
Contributions are welcome