/swift-embeddings

Run embedding models locally in Swift using MLTensor.

Primary LanguageSwiftMIT LicenseMIT

swift-embeddings

Run embedding models locally in Swift using MLTensor. Inspired by mlx-embeddings.

Supported Models Archictectures

BERT (Bidirectional Encoder Representations from Transformers)

Some of the supported models on Hugging Face:

XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)

Some of the supported models on Hugging Face:

CLIP (Contrastive Language–Image Pre-training)

NOTE: only text encoding is supported for now. Some of the supported models on Hugging Face:

Installation

Add the following to your Package.swift file. In the package dependencies add:

dependencies: [
    .package(url: "https://github.com/jkrukowski/swift-embeddings", from: "0.0.5")
]

In the target dependencies add:

dependencies: [
    .product(name: "Embeddings", package: "swift-embeddings")
]

Usage

Encoding

import Embeddings

// load model and tokenizer from Hugging Face
let modelBundle = try await Bert.loadModelBundle(
    from: "sentence-transformers/all-MiniLM-L6-v2"
)

// encode text
let encoded = modelBundle.encode("The cat is black")
let result = await encoded.cast(to: Float.self).shapedArray(of: Float.self).scalars

// print result
print(result)

Batch Encoding

import Embeddings
import MLTensorUtils

let texts = [
    "The cat is black",
    "The dog is black",
    "The cat sleeps well"
]
let modelBundle = try await Bert.loadModelBundle(
    from: "sentence-transformers/all-MiniLM-L6-v2"
)
let encoded = modelBundle.batchEncode(texts)
let distance = cosineDistance(encoded, encoded)
let result = await distance.cast(to: Float.self).shapedArray(of: Float.self).scalars
print(result)

Command Line Demo

To run the command line demo, use the following command:

swift run embeddings-cli <subcommand> [--model-id <model-id>] [--text <text>] [--max-length <max-length>]

Subcommands:

bert                    Encode text using BERT model
clip                    Encode text using CLIP model
xlm-roberta             Encode text using XLMRoberta model

Command line options:

--model-id <model-id>                       Id of the model to use
--text <text>                               Text to encode
--max-length <max-length>                   Maximum length of the input
-h, --help                                  Show help information.

Code Formatting

This project uses swift-format. To format the code run:

swift format . -i -r --configuration .swift-format

Acknowledgements

This project is based on and uses some of the code from: