/mllm

MultiModal Large Language Models

Primary LanguagePythonMIT LicenseMIT


MLLM

Multimodal Large Language Models
Explore the docs »

View Demo · Report Bug · Request Feature


Installation

pip install mllm

Extra dependencies

Some features might require extra dependencies.

For example, for the Gemini models, you can install the extra dependencies like this:

pip install mllm[gemini]

Usage

Create an MLLM router with a list of preferred models

import os
from mllm import Router

os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
os.environ["GEMINI_API_KEY"] = "..."

router = Router(
    preference=["gpt-4-turbo", "anthropic/claude-3-opus-20240229", "gemini/gemini-1.5-pro-latest"]
)

Create a new role based chat thread

from mllm import RoleThread

thread = RoleThread(owner_id="dolores@agentsea.ai")
thread.post(role="user", msg="Describe the image", images=["data:image/jpeg;base64,..."])

Chat with the MLLM, store the prompt data in the namespace foo

response = router.chat(thread, namespace="foo")
thread.add_msg(response.msg)

Ask for a structured response

from pydantic import BaseModel

class Animal(BaseModel):
    species: str
    color: str

thread.post(
    role="user",
    msg=f"What animal is in this image? Please output as schema {Animal.model_json_schema()}"
    images=["data:image/jpeg;base64,..."]
)

response = router.chat(thread, namespace="animal", expect=Animal)
animal_parsed = response.parsed

assert type(animal_parsed) == Animal

Find a saved thread or a prompt

RoleThread.find(id="123")
Prompt.find(id="456)

To store a raw openai prompt

from mllm import Prompt, RoleThread

thread = RoleThread()

msg = {
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Whats in this image?",
        },
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,..."},
        }
    ]
}
role_message = RoleMessage.from_openai(msg)
thread.add_msg(role_message)

response = call_openai(thread.to_openai())
response_msg = RoleMessage.from_openai(response["choices"][0]["message"])

saved_prompt = Prompt(thread, response_msg, namespace="foo")

Add images of any variety to the thread. We support base64, filepath, PIL, and URL

from PIL import Image

img1 = Image.open("img1.png")

thread.post(
  role="user",
  msg="Whats this image?",
  images=["data:image/jpeg;base64,...", "./img1.png", img1, "https://shorturl.at/rVyAS"]
)

Integrations

MLLM is integrated with:

  • Taskara A task management library for AI agents
  • Skillpacks A library to fine tune AI agents on tasks.
  • Surfkit A platform for AI agents
  • Threadmem A thread management library for AI agents

Community

Come join us on Discord.

Backends

Thread and prompt storage can be backed by:

  • Sqlite
  • Postgresql

Sqlite will be used by default. To use postgres simply configure the env vars:

DB_TYPE=postgres
DB_NAME=mllm
DB_HOST=localhost
DB_USER=postgres
DB_PASS=abc123

Thread image storage by default will utilize the db, to configure bucket storage using GCS:

  • Create a bucket with fine grained permissions
  • Create a GCP service account JSON with permissions to write to the bucket
export THREAD_STORAGE_SA_JSON='{
  "type": "service_account",
  ...
}'
export THREAD_STORAGE_BUCKET=my-bucket