pip install --upgrade git+https://github.com/Parry-Parry/idaLLM.git
- LLM Serving: Through vLLM and FastAPI you can run an efficient endpoint for LLM inference
- Local Inference: Through vLLM you can execute prompts (coupled with LightChain)
To run the API in its simplest form
python -m idallm.api.serve --model <MODEL_ID> --host 0.0.0.0 --port 8080
To run local inference in its simplest form (over a text file )
python -m idallm.local.serve --model <MODEL_ID> --input_file my_prompts.txt --output_file my_outputs.txt
- torch : self-explanatory
- transformers : self-explanatory
- fastapi: Backbone used to serve applications over TCP
- ray: Deployment component
- vllm: Efficient model serving
- LightChain: Makes life a lot easier in terms of prompt formatting, chain-prompting and chat functionality