New Container: OllamaContainer

Question

New Container: OllamaContainer

bricefotzo opened this issue 3 months ago · 6 comments

bricefotzo commented 3 months ago

Add support for the OllamaContainer to simplify running and testing LLMs through Ollama.

What is the new container you'd like to have?

I would like to request support for a new container: OllamaContainer.

Official Docker Hub image: ollama/ollama
Ollama Blog: Ollama Docker Image

Why not just use a generic container for this?

The generic DockerContainer("ollama/ollama:latest") approach is not sufficient due to several reasons:

Complicated setup/configuration: Ollama can run with GPU acceleration inside Docker containers for Nvidia GPUs. It's important to be able to check the availability of GPUs and run the container with some if possible.
Model management: There is also the need to pull a model and commit container changes into an image after pulling a model. So that the image containing the model can be reused later.

Answer 1 · 2024-06-26T20:45:24.000Z

does it make sense to mount a volume instead of/as well as committing a container?

Answer 2 · 2024-06-27T08:05:55.000Z

@alexanderankin you raise a good point!

I asked myself the same question but as it's implemented with commit in the java and typescript versions I thought it's maybe because it's simpler and more robust to implement with commit.

Using commit, you don't need to identify all the files generated by ollama to make the volume binding where using volume you should specify all the needed files. The Ollama PATH is /root/.ollama but is there another locations affected by the pull of a model? I don't know, so having the doubt, the commit approach looks safer.

Plus, the commit approach ensures that the test environment is completely self-contained within the Docker image, which can be beneficial for reproducibility and portability across different test environments I guess?

I think @ilopezluna or @eddumelendez have a better answer to your question

Answer 3 · 2024-06-27T08:12:52.000Z

I have also realized that this is a more general concern which would also apply to java implementation. Looking at the various available images for ollama as well. one theory is that sizes vary based on number of gpu drivers available, so macos build is quite small - but then using this image on a mac would be different, so thinking about how to test this as well... this may be one of those images that only works on a linux machine, which may be okay.

Answer 4 · 2024-06-27T09:23:53.000Z

I added an ollama_dir option, which maybe should have been named ollama_home or something but it does seem to work (there is a test as well)

Answer 5 · 2024-06-27T09:33:37.000Z

Nice idea, so that both are possible! Thanks @alexanderankin ! Can't wait to try

Answer 6 · 2024-06-27T10:38:15.000Z

why wait:

mkdir testcontainers-python-617 && cd $_
python -m venv .venv && source $_/bin/activate
pip install git+https://github.com/testcontainers/testcontainers-python@main

script.py:

from json import loads
from pathlib import Path

from requests import post
from testcontainers.ollama import OllamaContainer


def split_by_line(generator):
    data = b''
    for each_item in generator:
        for line in each_item.splitlines(True):
            data += line
            if data.endswith((b'\r\r', b'\n\n', b'\r\n\r\n', b'\n')):
                yield from data.splitlines()
                data = b''
    if data:
        yield from data.splitlines()


with OllamaContainer(ollama_home=Path.home() / ".ollama") as ollama:
    if "llama3:latest" not in [e["name"] for e in ollama.list_models()]:
        print("did not find 'llama3:latest', pulling")
        ollama.pull_model("llama3:latest")
    endpoint = ollama.get_endpoint()
    for chunk in split_by_line(
            post(url=f"{endpoint}/api/chat", stream=True, json={
                "model": "llama3:latest",
                "messages": [{"role": "user", "content": "what color is the sky?"}]
            })
    ):
        print(loads(chunk)["message"]["content"], end="")