This repository provides a simple example of setting up and using Ollama with the Ollama Python library. Code is available on this notebook.
Note: This repo is still WIP (pre-spelling)
Last update: Feb 1st, 2024
- Motivation
- Scope
- Setting up Ollama
- Setting Up Python Environment
- Running Ollama in Python
- Resources
- License
So far, running LLMs has required a large amount of computing resources, mainly GPUs. Running locally, a simple prompt with a typical LLM takes on an average Mac laptop about 10 minutes. The Ollama project solves this issue and enables the run of LLM locally with or without GPU support with high performance. It took me 16-18 seconds to run the Mistral model with Ollama on a Mac inside a dockerized environment with 4 CPUs and 8GB RAM. Similarly, running simpler prompts with Llama2 and Vicuna took me ~40 and ~20 seconds, respectively. Note that this is not a robust benchmark, and the model size is a big factor, but compared to the 10-minute run time without Ollama using the model directly, it is a very promising result.
Please note the following:
- Currently, Ollama is available for MacOS and Linux OS, the Windows version is coming soon (according to the project documentation)
- On Mac, there is no support for GPU when running inside a container, but the performance is still impressive
- The Ollama project is fairly new, you should expect more changes and improvements until getting to a stable version
This tutorial's scope is setting up an environment for running the Ollama Python library on a local machine, this includes:
- Running Ollama locally with a virtual environment, or
- Running Ollama inside a dockerized environment
This includes testing the following models:
- Mistral
- Llama2
- Vicuna
I tested this notebook inside a dockerized environment with 4 CPUs and 8 GB RAM, and it worked well. Some models have memory requirements, and it is worth checking the model's requirements and allocating resources accordingly.
This repo provides a simple example for Docker and virtual environment settings, and it includes the following files:
.
├── .devcontainer
│ ├── Dockerfile
│ ├── devcontainer.env
│ ├── devcontainer.json
│ ├── install_dependencies.sh
│ ├── install_quarto.sh
│ ├── install_requirements.sh
│ └── requirements.txt
├── README.md
├── ollama
│ ├── bin
│ ├── include
│ ├── lib
│ └── pyvenv.cfg
├── ollama-poc.ipynb
└── tests
├── test1.py
└── test2.ipynb
Where the .devcontainer
includes the Docker settings for the VScode's Dev Containers extension, the ollama
folder contains the Python virtual environment (in case you want to run locally), and the ollama-poc.ipynb
contains a code example.
To run this notebook, you will first install Ollama: Go to the Download tab on the Ollama website, select your OS, and follow the instructions. Note: Currently, there is support for MacOS and Linux OS. Windows support, according to the llama's website, is coming soon.
Next, from the terminal:
- Start Ollama - Once installed, use the
ollama serve
command to launch the Ollama server. You should expect the following output:
2024/01/28 07:34:07 images.go:857: INFO total blobs: 0
2024/01/28 07:34:07 images.go:864: INFO total unused blobs removed: 0
2024/01/28 07:34:07 routes.go:950: INFO Listening on 127.0.0.1:11434 (version 0.1.22)
2024/01/28 07:34:07 payload_common.go:106: INFO Extracting dynamic libraries...
2024/01/28 07:34:10 payload_common.go:145: INFO Dynamic LLM libraries [cuda_v11 cpu]
2024/01/28 07:34:10 gpu.go:94: INFO Detecting GPU type
2024/01/28 07:34:10 gpu.go:236: INFO Searching for GPU management library libnvidia-ml.so
2024/01/28 07:34:10 gpu.go:282: INFO Discovered GPU libraries: []
2024/01/28 07:34:10 gpu.go:236: INFO Searching for GPU management library librocm_smi64.so
2024/01/28 07:34:10 gpu.go:282: INFO Discovered GPU libraries: []
2024/01/28 07:34:10 cpu_common.go:18: INFO CPU does not have vector extensions
2024/01/28 07:34:10 routes.go:973: INFO no GPU detected
[GIN] 2024/01/28 - 14:13:47 | 200 | 238.75µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/01/28 - 14:13:47 | 200 | 834.333µs | 127.0.0.1 | GET "/api/tags"
[GIN] 2024/01/28 - 14:13:59 | 200 | 229.292µs | 127.0.0.1 | HEAD "/"
2024/01/28 14:14:02 download.go:123: INFO downloading e8a35b5937a5 in 42 100 MB part(s)
Note: When running Ollama inside a Docker on a Mac machine it does not support GPU
- Download a model - Use the `pull`` command to pull a model from the Ollama model registry. For example, the below code will download mistral:
ollama pull mistral
pulling manifest
pulling e8a35b5937a5... 100% ▕███████████████████████████████▏ 4.1 GB
pulling 43070e2d4e53... 100% ▕███████████████████████████████▏ 11 KB
pulling e6836092461f... 100% ▕███████████████████████████████▏ 42 B
pulling ed11eda7790d... 100% ▕███████████████████████████████▏ 30 B
pulling f9b1e3196ecf... 100% ▕███████████████████████████████▏ 483 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Ollama currently supports a variety of LLMs, a full list is available here: https://ollama.ai/library
To view the list of models on your local machine use the list
command:
ollama list
NAME ID SIZE MODIFIED
llama2:latest 78e26419b446 3.8 GB 3 minutes ago
mistral:latest 61e88e884507 4.1 GB 8 minutes ago
vicuna:latest 370739dc897b 3.8 GB About a minute ago
We will use the above three models in this demo.
There are many ways to set up a Python environment. I ran this demo inside a Dockerized environment, and you can run it locally using a virtual environment (which might be more straightforward).
Setting up a local Python environment if running locally with venv is straightforward. Starting with setting up a virtual environment named ollama (you can choose any other name):
python3 -m venv ollama
This will create a new folder named ollama under the root folder, which will include the following folders:
.
├── ollama
├── bin
├── include
└── lib
To activate the environment, use:
source ollama/bin/activate
Next, we will install the ollama library using pip:
pip install ollama
To confirm that the installation was successful, open Python and test if you can import the library:
Python 3.10.10 (v3.10.10:aad5f6a891, Feb 7 2023, 08:47:40) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ollama
>>>
If you want to run the code on Jupyter notebook, open Jupyter and set the notebook kernel to the ollama-poc
virtual environment (or any other name you used to set the environment).
This repo has Docker setting for running Ollama inside a dockerized environment using VScode's Dev Containers extension. The caveat is that it is working but is still experimental, and more optimization can be done. The .devcontainer
folder contains the Docker settings for this POC:
.
├── .devcontainer
├── Dockerfile
├── devcontainer.env
├── devcontainer.json
├── install_dependencies.sh
├── install_quarto.sh
├── install_requirements.sh
└── requirements.txt
I created this repo using this template and modified the requirements.txt
, devcontainer.json
, and Dockerfile
files.
Here is the devcontainer.json
settings:
devcontainer.json
{
"name": "ollama-poc",
// "image": "python:3.10",
"build": {
"dockerfile": "Dockerfile",
"args": {
"ENV_NAME": "ollama-poc",
"PYTHON_VER": "${localEnv:PYTHON_VER:3.10}",
"QUARTO_VER": "${localEnv:QUARTO_VER:1.3.450}"
}
},
"customizations": {
"settings": {
"python.defaultInterpreterPath": "/opt/conda/envs/ollama-poc/bin/python3",
"python.selectInterpreter": "/opt/conda/envs/ollama-poc/bin/python3"
},
"vscode": {
"extensions": [
// Documentation Extensions
"quarto.quarto",
"purocean.drawio-preview",
"redhat.vscode-yaml",
"yzhang.markdown-all-in-one",
// Docker Supporting Extensions
"ms-azuretools.vscode-docker",
"ms-vscode-remote.remote-containers",
// Python Extensions
"ms-python.python",
"ms-toolsai.jupyter",
// Github Actions
"github.vscode-github-actions"
]
}
},
// Optional, mount local volume:
// "mounts": [
// "source=${localEnv:DATA_FOLDER},target=/home/csv,type=bind,consistency=cache"
// ],
"remoteEnv": {
"MY_VAR": "${localEnv:MY_VAR:test_var}"
},
"runArgs": [
"--env-file",
".devcontainer/devcontainer.env"
],
"postCreateCommand": "ollama serve"
}
The Dockerfile set the Python environment, VScode extensions (e.g., Juypyter, etc.) and installed Ollama's Linux version. Once the folder opens inside the Dev Containers, the post-creat command is set to launch the Ollama server.
The next step is to pull the required models from the terminal as described above using the pull command:
ollama pull mistral
ollama pull llama2
ollama pull vicuna
In the last step, open the notebook and choose the kernel using the ollama Python environment (in line with the name set on the devcontainer.json file).
Alternatively, a more robust way is to use the official Ollama docker image.
From this point, after launching the Ollama serve and pulling the models, you can open Python and run selected model. For example, this is the Mistral model in action:
Python 3.10.13 (main, Sep 11 2023, 13:18:45) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ollama
>>> response = ollama.chat(model='mistral', messages=[
... {
... 'role': 'user',
... 'content': 'Why is the sky blue?',
... },
... ])
ontent'])
>>> print(response['message']['content'])
The color of the sky appears blue due to a process called Rayleigh scattering. As sunlight reaches Earth's atmosphere, it interacts with different gases and particles in the air. Blue light has a shorter wavelength and gets scattered more easily than other colors, such as red or yellow, because it travels in smaller, shorter waves. This scattering of blue light is what makes the sky appear blue during a clear day. However, at sunrise and sunset, the sky can take on hues of pink, orange, and red due to the longer wavelengths of light being scattered more effectively by the atmosphere.
- Ollama website - https://ollama.ai/
- Ollama source code - https://github.com/ollama
- Ollama Python and JS release notes - https://ollama.ai/blog/python-javascript-libraries
- Mistral LLM - https://huggingface.co/mistralai/Mistral-7B-v0.1
- Llama2 LLM - https://huggingface.co/meta-llama
- Vicuna LLM - https://huggingface.co/lmsys
- A Dockerized Python Development Environment Template - https://github.com/RamiKrispin/vscode-python-template
- Setting a Dockerized Python environment - https://medium.com/p/de2400c4812b
This tutorial is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.