Docker Multiple Inference

This project sets up a Streamlit application that compares responses from a local Llama model (using Ollama) and a Cerebras-hosted model side by side, all running within Docker containers.

Prerequisites

Docker
A Cerebras API key (obtain from cloud.cerebras.ai)

Setup Instructions

Clone the repository:

git clone https://github.com/your-username/docker-multiple-inference.git
cd docker-multiple-inference

Set up Ollama with Llama model:

docker pull ollama/ollama
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3:8b

Pull the Cerebras Cloud SDK Docker image:
```
docker pull cerebras/cerebras-cloud-sdk
```

Set up Cerebras environment:

export CEREBRAS_API_KEY="your-api-key-here"

Create a file named requirements.txt with the following content:
```
streamlit
requests
aiohttp
```
Create a file named app.py and paste the Python code for the dual-model chatbot into it. (Use the latest version of the code provided in the previous response)

Create a Dockerfile with the following content:

FROM cerebras/cerebras-cloud-sdk

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

EXPOSE 8501

CMD ["streamlit", "run", "app.py"]

Build the Docker image:

docker build -t docker-multiple-inference .

Run the Docker container: For macOS and Windows:

docker run -p 8501:8501 -e CEREBRAS_API_KEY=$CEREBRAS_API_KEY docker-multiple-inference

For Linux:

docker run -p 8501:8501 --add-host=host.docker.internal:host-gateway -e CEREBRAS_API_KEY=$CEREBRAS_API_KEY docker-multiple-inference

Access the application: Open a web browser and go to http://localhost:8501

Usage

Enter your prompt in the text input field at the bottom of the page.
The responses from both the local Llama model and the Cerebras model will appear side by side in real-time.
The conversation history is maintained for the local model to provide context in subsequent interactions.

Troubleshooting

If you encounter issues with the local Llama model, ensure that the Ollama container is running and the model is properly loaded.
For Cerebras API issues, verify that your API key is correctly set and that you have an active subscription.

If the Streamlit app fails to start, check the Docker logs:

docker logs $(docker ps -q --filter ancestor=docker-multiple-inference)

Note

This setup uses the host.docker.internal DNS name to allow the Streamlit app container to communicate with the Ollama container on the host machine. This works out of the box for Windows and macOS. For Linux, we use the --add-host flag to enable this functionality.

Contributing

Contributions to improve docker-multiple-inference are welcome. Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

lazyplatypus/docker-multiple-inference