/retsim-pytorch

A pytorch port of Google's RETSim model used in UniSim

Primary LanguagePythonMIT LicenseMIT

retsim-pytorch

PyPI Version Supported Python Versions

Welcome to retsim-pytorch, the PyTorch adaptation of Google's RETSim (Resilient and Efficient Text Similarity) model, which is part of the UniSim (Universal Similarity) framework.

This model is designed for efficient and accurate multilingual fuzzy string matching, near-duplicate detection, and assessing string similarity. For more information, please refer to the UniSim documentation.

Installation

You can easily install retsim-pytorch via pip:

pip install retsim-pytorch

Usage

You can configure the model using the RETSimConfig class. By default, it utilizes the same configuration as the original UniSim model. If you wish to use the same weights as the original Google model, you can download a SafeTensors port of the weights here.

Here's how to use the model in your code:

import torch
from retsim_pytorch import RETSim, RETSimConfig
from retsim_pytorch.preprocessing import binarize

# Configure the model
config = RETSimConfig()
model = RETSim(config)

# Prepare and run inference
binarized_inputs, chunk_ids = binarize(["hello world"])
embedded, unpooled = model(torch.tensor(binarized_inputs))