huggingface/huggingface_hub

HF_HUB_OFFLINE environment variable breaks Langchain's HuggingFaceEndpoints

ladi-pomsar opened this issue · 2 comments

Describe the bug

Hi everyone,

I am using Langchain with TEI for embeddings. As I usually use it on-prem + offline, I would prefer it not checking HF repeatedly, while still being able to work locally.

At some point, HF HUB introduced environment variable "HF_HUB_OFFLINE", that should do approximately that, quoting the first line:

If set, no HTTP calls will me made to the Hugging Face Hub.

However, as noted later on -

"If HF_HUB_OFFLINE=1 is set as environment variable and you call any method of HfApi, an OfflineModeIsEnabled exception will be raised."

This seems to be consistent with observed behaviour, but I would argue it is not desired, as it pretty much disables langchain-HF integration as well, as embedding functionality seems to be provided through Inference client.

I don't have a good idea how to solve this, aside from reworking langchain-huggingface to use REST APIs (did check, can retrieve the embeddings) or HF HUB blocking just calls to HF. I also raised this issue in langchain repo and hopefully we converge somewhere.

Reproduction

export HF_HUB_OFFLINE="1" and try to reach local TEI container from langchain.

Logs

| [2024-10-04 09:58:56,644] ERROR in app: Exception on /endpoint [POST]
| Traceback (most recent call last):
| File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1473, in wsgi_app
| response = self.full_dispatch_request()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 882, in full_dispatch_request
| rv = self.handle_user_exception(e)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/flask_cors/extension.py", line 178, in wrapped_function
| return cors_after_request(app.make_response(f(*args, **kwargs)))
| ^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 880, in full_dispatch_request
| rv = self.dispatch_request()
| ^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 865, in dispatch_request
| return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| //USER CODE
|
| File "/usr/local/lib/python3.11/site-packages/celery/local.py", line 182, in call
| return self.get_current_object()(*a, **kw)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/celery/app/task.py", line 411, in call
| return self.run(*args, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^
| //USER CODE
|
|
| File "/usr/local/lib/python3.11/site-packages/langchain_core/vectorstores/base.py", line 277, in add_documents
| return self.add_texts(texts, metadatas, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/langchain_postgres/vectorstores.py", line 885, in add_texts
| embeddings = self.embedding_function.embed_documents(texts)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/langchain_huggingface/embeddings/huggingface_endpoint.py", line 112, in embed_documents
| responses = self.client.post(
| ^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/huggingface_hub/inference/_client.py", line 259, in post
| response = get_session().post(
| ^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 637, in post
| return self.request("POST", url, data=data, json=json, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
| resp = self.send(prep, **send_kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
| r = adapter.send(request, **kwargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 77, in send
| raise OfflineModeIsEnabled(
| huggingface_hub.errors.OfflineModeIsEnabled: Cannot reach http://localhost:80/: offline mode is enabled. To disable it, please unset the HF_HUB_OFFLINE environment variable.

System info

- huggingface_hub version: 0.23.4
- Platform: Linux-6.2.0-26-generic-x86_64-with-glibc2.36
- Python version: 3.11.10
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /root/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.3.1
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.3.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.4
- pydantic: 2.7.4
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /root/.cache/huggingface/hub
- HF_ASSETS_CACHE: /root/.cache/huggingface/assets
- HF_TOKEN_PATH: /root/.cache/huggingface/token
- HF_HUB_OFFLINE: True
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

Hello @ladi-pomsar, thanks for reporting this issue!
this basically occurs because the offline mode, i.e. when HF_HUB_OFFLINE=1, blocks all HTTP requests, including those to localhost which prevents requests to your local TEI container.

As a work around, you can use the configure_http_backend function to customize how HTTP requests are handled. By creating a custom HTTP session, you can block requests to the HF Hub while allowing requests to your local container.
Here is a code snippet that worked on my side:

import os
import requests
from huggingface_hub import configure_http_backend
from huggingface_hub.utils import OfflineModeIsEnabled
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

from requests.adapters import HTTPAdapter


class CustomOfflineAdapter(HTTPAdapter):
    def send(self, request, *args, **kwargs):
        blocked_domains = ["huggingface.co", "hf.co"]
        if any(domain in request.url for domain in blocked_domains):
            raise OfflineModeIsEnabled(f"Cannot reach {request.url}: offline mode is enabled.")
        return super().send(request, *args, **kwargs)


def backend_factory() -> requests.Session:
    """
    Any HTTP calls made by `huggingface_hub` will use a
    Session object instantiated by this factory
    """
    session = requests.Session()
    session.mount("http://", CustomOfflineAdapter())
    session.mount("https://", CustomOfflineAdapter())
    return session


configure_http_backend(backend_factory=backend_factory)

embeddings = HuggingFaceEndpointEmbeddings(model="http://localhost:8080")

text = "What is deep learning?"

doc_result = embeddings.embed_documents([text])
print(doc_result[0][:3])

I'm closing this issue but feel free to comment if you have any additional question about this 🤗

@hanouticelina Thank you very much, this works! :)