michaelfeil/infinity

HF_HOME not respected

Closed this issue · 6 comments

System Info

root@infinity-embeddings-deployment-7b9f45cfcc-vrj9j:/app/.cache# printenv
KUBERNETES_SERVICE_PORT_HTTPS=443
NVIDIA_VISIBLE_DEVICES=GPU-71058c1d-14e4-c507-8f62-2c67e4d8b154
KUBERNETES_SERVICE_PORT=443
INFINITY_EMBEDDINGS_SERVICE_SERVICE_HOST=172.19.41.16
PYTHONUNBUFFERED=1
HOSTNAME=infinity-embeddings-deployment-7b9f45cfcc-vrj9j
VLLM_SERVICE_PORT=tcp://172.19.47.119:8000
NVIDIA_REQUIRE_CUDA=cuda>=12.1 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526
VLLM_SERVICE_PORT_8000_TCP_PROTO=tcp
HUGGING_FACE_HUB_TOKEN=xxxxx
PWD=/app/.cache
NVIDIA_DRIVER_CAPABILITIES=compute,utility
NV_CUDA_CUDART_VERSION=12.1.55-1
HOME=/root
KUBERNETES_PORT_443_TCP=tcp://172.19.0.1:443
VLLM_SERVICE_SERVICE_HOST=172.19.47.119
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
SENTENCE_TRANSFORMERS_HOME=/app/.cache/torch
CUDA_VERSION=12.1.0
POETRY_VIRTUALENVS_IN_PROJECT=true
PIP_DEFAULT_TIMEOUT=100
EXTRAS=all
POETRY_NO_INTERACTION=1
**HF_HOME=/root/.cache/huggingface**
TERM=xterm
PIP_DISABLE_PIP_VERSION_CHECK=on
PYTHON=python3.11
INFINITY_EMBEDDINGS_SERVICE_PORT=tcp://172.19.41.16:8000
SHLVL=1
NVARCH=x86_64
KUBERNETES_PORT_443_TCP_PROTO=tcp
INFINITY_EMBEDDINGS_SERVICE_PORT_8000_TCP_PORT=8000
VLLM_SERVICE_PORT_8000_TCP_ADDR=172.19.47.119
INFINITY_EMBEDDINGS_SERVICE_SERVICE_PORT=8000
KUBERNETES_PORT_443_TCP_ADDR=172.19.0.1
NV_CUDA_COMPAT_PACKAGE=cuda-compat-12-1
LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
VLLM_SERVICE_PORT_8000_TCP_PORT=8000
KUBERNETES_SERVICE_HOST=172.19.0.1
KUBERNETES_PORT=tcp://172.19.0.1:443
KUBERNETES_PORT_443_TCP_PORT=443
INFINITY_EMBEDDINGS_SERVICE_PORT_8000_TCP_ADDR=172.19.41.16
VLLM_SERVICE_PORT_8000_TCP=tcp://172.19.47.119:8000
PATH=/app/.venv/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
INFINITY_EMBEDDINGS_SERVICE_PORT_8000_TCP_PROTO=tcp
INFINITY_EMBEDDINGS_SERVICE_PORT_8000_TCP=tcp://172.19.41.16:8000
PIP_NO_CACHE_DIR=off
VLLM_SERVICE_SERVICE_PORT=8000
_=/usr/bin/printenv
OLDPWD=/app

root@infinity-embeddings-deployment-7b9f45cfcc-vrj9j:/app/.cache# pwd
**/app/.cache**
root@infinity-embeddings-deployment-7b9f45cfcc-vrj9j:/app/.cache# du -h
4.0K	./torch/models--Salesforce--SFR-Embedding-Mistral/snapshots/938c560d1c236aa563b2dbdf084f28ab28bccb11/1_Pooling
24K	./torch/models--Salesforce--SFR-Embedding-Mistral/snapshots/938c560d1c236aa563b2dbdf084f28ab28bccb11
28K	./torch/models--Salesforce--SFR-Embedding-Mistral/snapshots
8.0K	./torch/models--Salesforce--SFR-Embedding-Mistral/refs
14G	./torch/models--Salesforce--SFR-Embedding-Mistral/blobs
4.0K	./torch/models--Salesforce--SFR-Embedding-Mistral/.no_exist/938c560d1c236aa563b2dbdf084f28ab28bccb11
8.0K	./torch/models--Salesforce--SFR-Embedding-Mistral/.no_exist
14G	./torch/models--Salesforce--SFR-Embedding-Mistral
4.0K	./torch/.locks/models--Salesforce--SFR-Embedding-Mistral
8.0K	./torch/.locks
14G	./torch
14G

Set HF_HOME to /root/.cache/huggingface

However the model is still getting downloaded to /app/.cache/torch

Information

  • Docker
  • The CLI directly via pip

Tasks

  • An officially supported command
  • My own modifications

Reproduction

run with docker with HF_HOME environment variable.

Expected behavior

HF_HOME is respected

Workaround is to mount volume into /app/.cache/torch

Thanks for posting the workaround with the issue.

Looks like you are doing nothing wrong here, and the HF_HOME did not get respected.

  • which version of infinity are you using?
  • are you using the docker image from docker hub?
  • which engine are you using?
  • Are you downloading a torch model?

Nope, i'm just running the default docker image.
docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path Salesforce/SFR-Embedding-Mistral --port $port --env HF_HOME /root/.cache/huggingface -v /modelcache:/root/.cache/huggingface

Okay, potentially it's because I set a default in the Dockerfile ENV SENTENCE_TRANSFORMERS_HOME=app/.cache -- I'll add it #195 .

Planning to close this issue then! Thanks for making me aware of this!

This is working now. Enjoy! Please make sure to pin the version, and test expected behaviour as you upgrade.

docker run -it --gpus all -e HF_HOME=/root/.cache/huggingface -v ./modelcache:/root/.cache  michaelf34/infinity:0.0.32

Thank you Michael!