- Supports synchronous usage. No dependency on Tokio.
- Uses @pykeio/ort for performant ONNX inference.
- Uses @huggingface/tokenizers for fast encodings.
- Supports batch embedddings generation with parallelism using @rayon-rs/rayon.
The default model is Flag Embedding, which is top of the MTEB leaderboard.
- Python 🐍: fastembed
- Go 🐳: fastembed-go
- JavaScript 🌐: fastembed-js
- BAAI/bge-base-en-v1.5
- BAAI/bge-small-en-v1.5 - Default
- BAAI/bge-large-en-v1.5
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/paraphrase-MiniLM-L12-v2
- nomic-ai/nomic-embed-text-v1
Alternatively, raw .onnx files can be loaded through the UserDefinedEmbeddingModel struct (for "bring your own" text embedding models).
Run the following command in your project directory:
cargo add fastembed
Or add the following line to your Cargo.toml:
[dependencies]
fastembed = "3"
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};
// With default InitOptions
let model = TextEmbedding::try_new(Default::default())?;
// Alternatively, here a "bring your own" model could be loaded
// Users will need to manually pass in the bytes of the relevant model files
// This includes the .onnx file itself and json files to constitute the TokenizerFiles struct)
let custom_model = TextEmbedding::try_new_from_user_defined(UserDefinedEmbeddingModel, options);
// With custom InitOptions
let model = TextEmbedding::try_new(InitOptions {
model_name: EmbeddingModel::AllMiniLML6V2,
show_download_progress: true,
..Default::default()
})?;
let documents = vec![
"passage: Hello, World!",
"query: Hello, World!",
"passage: This is an example passage.",
// You can leave out the prefix but it's recommended
"fastembed-rs is licensed under Apache 2.0"
];
// Generate embeddings with the default batch size, 256
let embeddings = model.embed(documents, None)?;
println!("Embeddings length: {}", embeddings.len()); // -> Embeddings length: 4
println!("Embedding dimension: {}", embeddings[0].len()); // -> Embedding dimension: 384
It's important we justify the "fast" in FastEmbed. FastEmbed is fast because:
- Quantized model weights
- ONNX Runtime which allows for inference on CPU, GPU, and other dedicated runtimes
- No hidden dependencies via Huggingface Transformers
- Better than OpenAI Ada-002
- Top of the Embedding leaderboards e.g. MTEB
Apache 2.0 © 2024