Sample deployment configuration
- Python 3.8 or higher
- Install the required Python packages.
pip install llama-cpp-python
pip install flask
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
pip install flask
Download and install Anaconda python from here
conda create -n deeplearning python=3.8
conda activate deeplearning
conda config --add channels conda-forge
conda config --set channel_priority strict
conda install llama-cpp-python
pip install flask
- Download the required model file from here.
- Run the
app_cpu.py
script to start the Flask server.
python app_cpu.py
- Run the
app_cuda_gpu.py
script to start the Flask server.
python app_cuda_gpu.py
- In a new terminal, run the
post_request.py
script to send a POST request to the server.
python post_request.py
app.py
is the main server file that uses Flask to create a web API. It uses the Llama library to generate responses based on the input message.
post_request.py
is a script that sends a POST request to the server with a message. The server then uses the Llama library to generate a response and sends it back to the client.
You can send a POST request to http://localhost:5000/api/deployment
with the following JSON body:
{
"message": "what are the best pesticides for crops in Kerala?"
}
The server will respond with the AI-generated response.
Clicke here to learn more on how to deploy a flask application in production