With this LLM can be used in the following manner
$ llm = LLM()
- Currently, the Quantized Version of LLAMA2 is hardcoded within the code but can be customized to support any model.
- I'll add a tutorial on how to use vector embeddings as well maybe in the same repo or other repo.
Don't forget to turn on the GPU.
Replace the URL in notebook with the one generated by server. And enjoy