Promote Huggingface Hub to first class citizen
NirantK opened this issue · 4 comments
The latest plans are in the most recent comment at the end
Error Handling improvements will come from two main improvements:
- Migrating away from GCP to Huggingface Hub completely
- This will reduce the edge cases we need to maintain, including file renaming and similar code too
- For models which we push to HF Hub, we can add a “name” and “sources” field —
- where the name is what HF Hub base model and sources is a list of community or Qdrant models
This issue is about the first one.
How to push models?
This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main
This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx
{
name: "BAAI/bge-base-en-v1.5",
sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}
We'll have to do this for each model, one at a time:
- BAAI/bge-small-en-v1.5
- BAAI/bge-base-en-v1.5
- sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
- intfloat/multilingual-e5-large
- jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
- jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation
In this process, we deprecate the following models by not porting them from GCP to HF Hub on our account:
- BAAI/bge-small-en
- BAAI/bge-small-zh-v1.5
- BAAI/bge-base-en
As a first step, I would implement support for both - HF and arbitrary links like it is right now. Then if we will see benefit of complete migration, we can continue
Motivation
Why should we consider upgrading Huggingface Hub as a download option from JinaEmbedding specific to all Embedding models?
- That makes it easier to support new models — including community additions
- Improves error handling e.g. we can add multiple sources for the same model, consistent naming convention
- Download stats for models built by Qdrant via Huggingface Hub
Where will we change things?
We'll change things in the Embedding
class. We'll add a download_from_hf
or similar function. We will continue to support existing models via GCS (arbitrary URLs via requests) in addition to Huggingface Hub.
Download Function will:
- First, check if there is a Huggingface model or source
- If yes, download from there and corresponding loaders will be used
- If not, we check from Google Cloud Storage/URL
At the end of this issue, users will be able to:
- Pass their own download URLs
- Pass a name and ONNX port source in a single PR to add a new model
How to push models?
This is a good reference contribution: https://huggingface.co/weakit-v/bge-base-en-v1.5-onnx/tree/main
This is what we should aim to replicate as much as we can. We'll have these models under the Qdrant Huggingface Hub account instead. So they'd be something like: qdrant/bge-base-en-v1.5-onnx
{
name: "BAAI/bge-base-en-v1.5",
sources: ["qdrant/bge-base-en-v1.5-onnx", "weakit-v/bge-base-en-v1.5-onnx"]
}
We'll have to do this for each model, one at a time:
- BAAI/bge-small-en-v1.5
- BAAI/bge-base-en-v1.5
- sentence-transformers/all-MiniLM-L6-v2 — do not quantize the model and push as is
- intfloat/multilingual-e5-large
- jinaai/jina-embeddings-v2-small-en — we should be able to retain the existing embedding implementation
- jinaai/jina-embeddings-v2-base-en — we should be able to retain the existing embedding implementation
cc @Anush008
Added the 4 models here: https://huggingface.co/Qdrant. This should unblock you completely
When a -Q and without Q suffix both are available, prefer the one without the Q suffix.
The other pattern is that of Jina: https://huggingface.co/jinaai/jina-embeddings-v2-small-en/tree/main
I'd recommend that we find a way to handle this without downloading PyTorch files at all. If we can't find that, open a new issue and I'll coordinate with the Jina folks.
is "Qdrant/bge-m3-onnx" already supported ?