/llama-cpp-with-gradio

Primary LanguagePythonApache License 2.0Apache-2.0

LLAMA-CPP-WITH-GRADIO img

img img img

img img

This project integrates a language model (LLAMA) with a graphical user interface provided by Gradio. It includes inference capabilities for LLAMA models and provides a web-based interface for interaction. The server-side implementation is built using FastAPI. This project can be used independently by accessing the root page of the deployed project.

Пример

Table of Contents

Installation

Local Installation

To install all dependencies and run the project locally, follow these steps:

  1. Create a virtual environment and activate it:

    python3 -m venv venv
    source venv/bin/activate
  2. Install the required Python dependencies:

    pip install -r requirements.txt
  3. Download the model: Ensure you have wget installed. You can download the model using:

    wget -P src/models/ https://huggingface.co/IlyaGusev/saiga_llama3_8b_gguf/resolve/main/model-q4_K.gguf

    Or you can download any model in GGUF format and place it in the src/models directory. Don't forget to change the MODEL_PATH variable in the .env file to specify which model you want to use.

  4. Run the Gradio app: Navigate to the src directory and run the application:

    python3 src/gradio_app.py

Docker Installation

To build and run a Docker container, follow these steps:

  1. Build the Docker image:

    docker build -t llama-gradio .

    Ensure the server is bound to 0.0.0.0 to be accessible from outside the Docker container. Change server_name variable in src/gradio_app.py before building the docker image.

  2. Run the Docker container:

    docker run -p 8000:8000 --name llama_gradio_container llama-gradio

Usage

Once the server is running, open your web browser and navigate to http://127.0.0.1:8000 to interact with the Gradio interface. You can input text and get responses generated by the LLAMA model in real-time.

Project Structure

LLAMA-CPP-WITH-GRADIO/
├── src/
│   ├── __pycache__/
│   ├── models/
│   │   ├── saiga-q4_K.gguf
│   │   └── download_guff.py
│   ├── env.py
│   ├── gradio_app.py
│   ├── llama_inference.py
│   └── requirements.txt
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

Tasks

  • Build docker container.
  • Add llama JSON output example.
  • Add llama function usage example.

Contribution

Feel free to open an issue or submit a pull request. Contributions are welcome!