BUG ERROR: Server stops accepting new requests after _core_batch(self) exceptions
Opened this issue · 1 comments
System Info
Hi,
Trying to run infinity as the embeddings server for Dify.
When there is an error running one POST on /embeddings, the server stops processing further requests.
Is seems https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/infinity_emb/inference/batch_handler.py#L423 is not in a try clause which may be the root of this issue ?
Running : infinity_emb v2 --model-id BAAI/bge-small-en-v1.5
Info : WSL2 , python 3.11.9, infinity_emb==0.0.39,
Information
- Docker
- The CLI directly via pip
Tasks
- An officially supported command
- My own modifications
Reproduction
ERROR 2024-06-02 08:21:46,304 infinity_emb ERROR: shape '[2, 512]' is invalid for input of size 524288 batch_handler.py:434
Traceback (most recent call last):
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/infinity_emb/inference/batch_handler.py", line 423, in _core_batch
embed = self._model.encode_core(feat)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/infinity_emb/transformer/embedder/sentence_transformer.py", line 97, in encode_core
out_features: "Tensor" = self.forward(features)["sentence_embedding"]
^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/container.py", line 217, in forward
input = module(input)
^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/sentence_transformers/models/Transformer.py", line 117, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 1137, in forward
encoder_outputs = self.encoder(
^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/transformers/models/bert/modeling_bert.py", line 690, in forward
layer_outputs = layer_module(
^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/henri/anaconda3/envs/infinity_env/lib/python3.11/site-packages/optimum/bettertransformer/models/encoder_models.py", line 300, in forward
attention_mask = torch.reshape(attention_mask, (attention_mask.shape[0], attention_mask.shape[-1]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[2, 512]' is invalid for input of size 524288
Expected behavior
Whena POST to /embeddings fails , I expect next POSTs to be processed
Okay, thats concering and should not happen.
there is no way to “autorecover” e.g. in case you run out of memory. I assume that is the case here.
Hard to guess what is the cause without more information about how you are using infinity via pip. Also check the usage instructions, I updated the tutorials recenty. @vitteloil