LLAMA-CPP-WITH-GRADIO

This project integrates a language model (LLAMA) with a graphical user interface provided by Gradio. It includes inference capabilities for LLAMA models and provides a web-based interface for interaction. The server-side implementation is built using FastAPI. This project can be used independently by accessing the root page of the deployed project.

Installation
- Local Installation
- Docker Installation
Usage
Project Structure
Tasks
Contribution

Installation

Local Installation

To install all dependencies and run the project locally, follow these steps:

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Install the required Python dependencies:
```
pip install -r requirements.txt
```
Download the model: Ensure you have wget installed. You can download the model using:
```
wget -P src/models/ https://huggingface.co/IlyaGusev/saiga_llama3_8b_gguf/resolve/main/model-q4_K.gguf
```
Or you can download any model in GGUF format and place it in the src/models directory. Don't forget to change the MODEL_PATH variable in the .env file to specify which model you want to use.
Run the Gradio app: Navigate to the src directory and run the application:
```
python3 src/gradio_app.py
```

Docker Installation

To build and run a Docker container, follow these steps:

Build the Docker image:
```
docker build -t llama-gradio .
```
Ensure the server is bound to 0.0.0.0 to be accessible from outside the Docker container. Change server_name variable in src/gradio_app.py before building the docker image.

Run the Docker container:

docker run -p 8000:8000 --name llama_gradio_container llama-gradio

Usage

Once the server is running, open your web browser and navigate to http://127.0.0.1:8000 to interact with the Gradio interface. You can input text and get responses generated by the LLAMA model in real-time.

Project Structure

LLAMA-CPP-WITH-GRADIO/
├── src/
│   ├── __pycache__/
│   ├── models/
│   │   ├── saiga-q4_K.gguf
│   │   └── download_guff.py
│   ├── env.py
│   ├── gradio_app.py
│   ├── llama_inference.py
│   └── requirements.txt
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

Tasks

Build docker container.
Add llama JSON output example.
Add llama function usage example.

Contribution

Feel free to open an issue or submit a pull request. Contributions are welcome!

revolvedai/llama-cpp-with-gradio