The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with Ollama.
Before you can use the Ollama Python library locally you need to complete a few steps. You will need to have Ollama installed and an ollama server running on your local machine. You also need to pip install the Ollama python library.
- Download & Install Ollama:
- Install the Ollama Python library in your Python environment:
pip install ollama
- Run the Ollama server with a specific model:
- Example:
ollama run llama3
- Notes:
- The model you specify will be automatically downloaded and cached on your machine.
- You will need to run this step each time you restart your computer or each time after you kill your ollama server.
- You can also run a specific version of a model:
- See available models here.
- Once at a model page, you can view specific versions and related version info by clicking on
tags
at the top of the model page and selecting the version you want. - Example:
ollama run llama3:8b
- Example:
import ollama
response = ollama.chat(
# Note: This model should be the same as the one you started with `ollama run`
model='llama3',
messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
]
)
print(response['message']['content'])
Response streaming can be enabled by setting stream=True
, modifying function calls to return a Python generator where each part is an object in the stream.
import ollama
stream = ollama.chat(
model='llama3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
The Ollama Python library's API is designed around the Ollama REST API
ollama.chat(model='llama3', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
ollama.generate(model='llama3', prompt='Why is the sky blue?')
ollama.list()
ollama.show('llama3')
modelfile='''
FROM llama3
SYSTEM You are mario from super mario bros.
'''
ollama.create(model='example', modelfile=modelfile)
ollama.copy('llama3', 'user/llama3')
ollama.delete('llama3')
ollama.pull('llama3')
ollama.push('user/llama3')
ollama.embeddings(model='llama3', prompt='The sky is blue because of rayleigh scattering')
A custom client can be created with the following fields:
host
: The Ollama host to connect totimeout
: The timeout for requests
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
response = await AsyncClient().chat(model='llama3', messages=[message])
asyncio.run(chat())
Setting stream=True
modifies functions to return a Python asynchronous generator:
import asyncio
from ollama import AsyncClient
async def chat():
message = {'role': 'user', 'content': 'Why is the sky blue?'}
async for part in await AsyncClient().chat(model='llama3', messages=[message], stream=True):
print(part['message']['content'], end='', flush=True)
asyncio.run(chat())
Errors are raised if requests return an error status or if an error is detected while streaming.
model = 'does-not-yet-exist'
try:
ollama.chat(model)
except ollama.ResponseError as e:
print('Error:', e.error)
if e.status_code == 404:
ollama.pull(model)