Creating an OpenAI Embedding API Compatible Server using Open-LLM model.
Purpose:
- For embedding models to generate well-separated embedding vectors, they need to be trained with a good tokenizer and a large amount of data.
- However, finding a suitable embedding model for multi-language environments can be challenging.
- The quality of the embedding model that can be obtained depends on the appropriate level of the model's performance.
Limitations:
- When extracting and creating an embedding model, there are inherent limitations.
Process:
- Use mergekit to extract the 0th layer.
- Typically, this extraction includes the embed_tokens layer along with the RMSNorm layer.
- Use the following command to extract the model using mergekit:
# mergekit [config_file] [save_target_path]
mergekit-yaml ./example.yaml ./test
- The
embedding
function inserver.py
handles the embedding extraction. It loads the model using theAutoModel
andAutoTokenizer
from thetransformers
library and processes the tokens. - The embeddings are extracted by processing the tokens through the loaded layers.
- However, positional encoding is not separately handled in this process.
- Since other embedding models do not require appropriate positional encoding, the code for it is not included.
- In the
server.py
code, you need to set the model name and path:
models_list = {
# Model Name: Model Path
'text-embedding-3-small': './test/'
}
- The following code in the script sets the default model name:
model = data.get('model', 'text-embedding-3-small')
# Start the server
python server.py
# Test
curl -X POST -H "Content-Type: application/json" -d '{"input": "hello", "model": "text-embedding-3-small"}' http://localhost:5000/v1/embeddings
Required Libraries:
- Install the necessary libraries using the following command:
pip install flask torch transformers
License:
- MIT
Warning: Testing
- The code has been tested for functionality, but the quality of the embeddings has not been evaluated.
- The following models have been tested:
- LLaMa v2 / v3
- Gemma 7B 1.1
- Mistral 7B
References: