This project sets up a Streamlit application that compares responses from a local Llama model (using Ollama) and a Cerebras-hosted model side by side, all running within Docker containers.
- Docker
- A Cerebras API key (obtain from cloud.cerebras.ai)
-
Clone the repository:
git clone https://github.com/your-username/docker-multiple-inference.git cd docker-multiple-inference
-
Set up Ollama with Llama model:
docker pull ollama/ollama docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama docker exec -it ollama ollama run llama3:8b
-
Pull the Cerebras Cloud SDK Docker image:
docker pull cerebras/cerebras-cloud-sdk
-
Set up Cerebras environment:
export CEREBRAS_API_KEY="your-api-key-here"
-
Create a file named
requirements.txt
with the following content:streamlit requests aiohttp
-
Create a file named
app.py
and paste the Python code for the dual-model chatbot into it. (Use the latest version of the code provided in the previous response) -
Create a
Dockerfile
with the following content:FROM cerebras/cerebras-cloud-sdk WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app.py . EXPOSE 8501 CMD ["streamlit", "run", "app.py"]
-
Build the Docker image:
docker build -t docker-multiple-inference .
-
Run the Docker container: For macOS and Windows:
docker run -p 8501:8501 -e CEREBRAS_API_KEY=$CEREBRAS_API_KEY docker-multiple-inference
For Linux:
docker run -p 8501:8501 --add-host=host.docker.internal:host-gateway -e CEREBRAS_API_KEY=$CEREBRAS_API_KEY docker-multiple-inference
-
Access the application: Open a web browser and go to
http://localhost:8501
- Enter your prompt in the text input field at the bottom of the page.
- The responses from both the local Llama model and the Cerebras model will appear side by side in real-time.
- The conversation history is maintained for the local model to provide context in subsequent interactions.
- If you encounter issues with the local Llama model, ensure that the Ollama container is running and the model is properly loaded.
- For Cerebras API issues, verify that your API key is correctly set and that you have an active subscription.
- If the Streamlit app fails to start, check the Docker logs:
docker logs $(docker ps -q --filter ancestor=docker-multiple-inference)
This setup uses the host.docker.internal
DNS name to allow the Streamlit app container to communicate with the Ollama container on the host machine. This works out of the box for Windows and macOS. For Linux, we use the --add-host
flag to enable this functionality.
Contributions to improve docker-multiple-inference are welcome. Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.