qdrant/fastembed

Promote Huggingface Hub to first class citizen

NirantK opened this issue · 4 comments

The latest plans are in the most recent comment at the end

Error Handling improvements will come from two main improvements:

  1. Migrating away from GCP to Huggingface Hub completely
    1. This will reduce the edge cases we need to maintain, including file renaming and similar code too
  2. For models which we push to HF Hub, we can add a “name” and “sources” field —
    1. where the name is what HF Hub base model and sources is a list of community or Qdrant models

This issue is about the first one.

How to push models?

This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main

This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx

{
   name: "BAAI/bge-base-en-v1.5",
   sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}

We'll have to do this for each model, one at a time:

  • BAAI/bge-small-en-v1.5
  • BAAI/bge-base-en-v1.5
  • sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
  • intfloat/multilingual-e5-large
  • jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
  • jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation

In this process, we deprecate the following models by not porting them from GCP to HF Hub on our account:

  1. BAAI/bge-small-en
  2. BAAI/bge-small-zh-v1.5
  3. BAAI/bge-base-en

As a first step, I would implement support for both - HF and arbitrary links like it is right now. Then if we will see benefit of complete migration, we can continue

Motivation

Why should we consider upgrading Huggingface Hub as a download option from JinaEmbedding specific to all Embedding models?

  1. That makes it easier to support new models — including community additions
  2. Improves error handling e.g. we can add multiple sources for the same model, consistent naming convention
  3. Download stats for models built by Qdrant via Huggingface Hub

Where will we change things?

We'll change things in the Embedding class. We'll add a download_from_hf or similar function. We will continue to support existing models via GCS (arbitrary URLs via requests) in addition to Huggingface Hub.

Download Function will:

  1. First, check if there is a Huggingface model or source
  2. If yes, download from there and corresponding loaders will be used
  3. If not, we check from Google Cloud Storage/URL

At the end of this issue, users will be able to:

  • Pass their own download URLs
  • Pass a name and ONNX port source in a single PR to add a new model

How to push models?

This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main

This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx

{
   name: "BAAI/bge-base-en-v1.5",
   sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}

We'll have to do this for each model, one at a time:

  • BAAI/bge-small-en-v1.5
  • BAAI/bge-base-en-v1.5
  • sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
  • intfloat/multilingual-e5-large
  • jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
  • jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation

cc @Anush008
Added the 4 models here: https://huggingface.co/Qdrant. This should unblock you completely

When a -Q and without Q suffix both are available, prefer the one without the Q suffix.

The other pattern is that of Jina: https://huggingface.co/jinaai/jina-embeddings-v2-small-en/tree/main

I'd recommend that we find a way to handle this without downloading PyTorch files at all. If we can't find that, open a new issue and I'll coordinate with the Jina folks.

is "Qdrant/bge-m3-onnx" already supported ?