This project integrates a language model (LLAMA) with a graphical user interface provided by Gradio. It includes inference capabilities for LLAMA models and provides a web-based interface for interaction. The server-side implementation is built using FastAPI. This project can be used independently by accessing the root page of the deployed project.
To install all dependencies and run the project locally, follow these steps:
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate
-
Install the required Python dependencies:
pip install -r requirements.txt
-
Download the model: Ensure you have
wget
installed. You can download the model using:wget -P src/models/ https://huggingface.co/IlyaGusev/saiga_llama3_8b_gguf/resolve/main/model-q4_K.gguf
Or you can download any model in GGUF format and place it in the
src/models
directory. Don't forget to change theMODEL_PATH
variable in the.env
file to specify which model you want to use. -
Run the Gradio app: Navigate to the
src
directory and run the application:python3 src/gradio_app.py
To build and run a Docker container, follow these steps:
-
Build the Docker image:
docker build -t llama-gradio .
Ensure the server is bound to
0.0.0.0
to be accessible from outside the Docker container. Changeserver_name
variable insrc/gradio_app.py
before building the docker image. -
Run the Docker container:
docker run -p 8000:8000 --name llama_gradio_container llama-gradio
Once the server is running, open your web browser and navigate to http://127.0.0.1:8000
to interact with the Gradio interface. You can input text and get responses generated by the LLAMA model in real-time.
LLAMA-CPP-WITH-GRADIO/
├── src/
│ ├── __pycache__/
│ ├── models/
│ │ ├── saiga-q4_K.gguf
│ │ └── download_guff.py
│ ├── env.py
│ ├── gradio_app.py
│ ├── llama_inference.py
│ └── requirements.txt
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt
- Build docker container.
- Add llama JSON output example.
- Add llama function usage example.
Feel free to open an issue or submit a pull request. Contributions are welcome!