michaelfeil/infinity

infinity_emb failed at startup using `torch.compile` when installed via pip

Closed this issue ยท 9 comments

commit hash: 296472e

I tried it on my Linux machne - Ubuntu 22.04 with CUDA 12.3, and it was failed.

% infinity_emb --device cuda --engine torch
2024-03-03 11:05:28.807 | WARNING  | fastembed.embedding:<module>:7 - DefaultEmbedding, FlagEmbedding, JinaEmbedding are deprecated. Use TextEmbedding instead.
INFO:     Started server process [4620]
INFO:     Waiting for application startup.
INFO     2024-03-03 11:05:29,079 infinity_emb INFO: model=`BAAI/bge-small-en-v1.5` selected, using engine=`torch` and device=`cuda`                               select_model.py:54
INFO     2024-03-03 11:05:29,378 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: BAAI/bge-small-en-v1.5              SentenceTransformer.py:106
INFO     2024-03-03 11:05:31,576 infinity_emb INFO: Adding optimizations via Huggingface optimum. Disable by setting the env var `INFINITY_DISABLE_OPTIMUM`       acceleration.py:20
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
INFO     2024-03-03 11:05:31,580 infinity_emb INFO: Switching to half() precision (cuda: fp16). Disable by the setting the env var                        sentence_transformer.py:67
         `INFINITY_DISABLE_HALF`
INFO     2024-03-03 11:05:31,586 infinity_emb INFO: using torch.compile()                                                                                 sentence_transformer.py:73
zsh: segmentation fault (core dumped)  infinity_emb --device cuda --engine torch
%

I found issue #115 and export INFINITY_DISABLE_COMPILE=TRUE works. But it is strange that the default setting was failed. It is very strange.

Interesting - torch.compile does not seem to work then. you might need gcc as a c++ installed. Best with e.g. build-essential?

Otherwise: Could you provide some longer logs - seems like they are not complete at the end.

Of course, gcc and g++ have been installed. Here are some version infos from my Linux machine:

% uname -a
Linux eleonora 6.5.0-21-generic #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb  9 13:32:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
% gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
%

And for log, I tried to run infinity_emb --device cuda --log-level debug to get more logs, but exactly same log was produced. % is the shell prompt because I am using zsh.

Oh, I missed the segfault at the end of the script.
What GPU is this on?
Is the same happening via dockerfile (cuda12.1)?
Have you used other models with torch.compile?

My GPU is PNY 4060 Ti 16GB. And with the docker image, it run without any problem. Then II found that there was Python 3.10 in the docker image. So I tried with venv of 3.10, and it run well too. Strange, but the problem was solved. And for pip install infinity-emb[all] it did not work on both Python 3.10 and 3.11 in my case. That's why I installed infinity with poetry from the source code on Python 3.11.

Okay, if the docker image runs, I can provide no further assistance - its to hard to debug, I would guess some c++ extension might be incompatible.

Please install all the (pip and system/apt) dependencies from the image on your system, or disable torch.compile

I see. Here I add the package list of pip freeze from the docker image michaelf34/infinity tag 0.026 linux/amd64. If there is anyone who has the same problem with me, he/she doesn't need to pull docker image to get this list.

aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==3.7.1
async-timeout==4.0.3
attrs==23.2.0
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
codespell==2.2.6
colorama==0.4.6
coloredlogs==15.0.1
ctranslate2==4.0.0
datasets==2.14.4
dill==0.3.7
diskcache==5.6.3
evaluate==0.4.1
exceptiongroup==1.2.0
fastapi==0.103.2
fastembed==0.2.1
filelock==3.13.1
flatbuffers==23.5.26
frozenlist==1.4.1
fsspec==2024.2.0
h11==0.14.0
httptools==0.6.1
huggingface-hub==0.20.3
humanfriendly==10.0
idna==3.6
# Editable install with no version control (infinity_emb==0.0.26)
-e /app
Jinja2==3.1.3
joblib==1.3.2
loguru==0.7.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
onnx==1.15.0
onnxruntime==1.17.0
optimum==1.17.1
orjson==3.9.14
packaging==23.2
pandas==2.2.0
pillow==10.2.0
prometheus-fastapi-instrumentator==6.1.0
prometheus_client==0.20.0
protobuf==4.25.3
pyarrow==15.0.0
pydantic==2.6.1
pydantic_core==2.16.2
Pygments==2.17.2
python-dateutil==2.8.2
python-dotenv==1.0.1
pytz==2024.1
PyYAML==6.0.1
regex==2023.12.25
requests==2.31.0
responses==0.18.0
rich==13.7.0
safetensors==0.4.2
scikit-learn==1.4.1.post1
scipy==1.12.0
sentence-transformers==2.4.0
sentencepiece==0.1.99
shellingham==1.5.4
six==1.16.0
sniffio==1.3.0
starlette==0.27.0
sympy==1.12
threadpoolctl==3.3.0
tokenizers==0.15.2
torch==2.2.0
tqdm==4.66.2
transformers==4.37.2
triton==2.2.0
typer==0.9.0
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.0
uvicorn==0.23.2
uvloop==0.19.0
watchfiles==0.21.0
websockets==12.0
xxhash==3.4.1
yarl==1.9.4

Whats the advantage of your pip freeze - is this more helpful than poetry lock? https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/poetry.lock

Because I'm very new to poetry?

Closing for stale.