/modelfusion

The TypeScript library for building multi-modal AI applications.

Primary LanguageTypeScriptMIT LicenseMIT

ModelFusion

The TypeScript library for building multi-modal AI applications.

NPM Version MIT License Docs Discord Created by Lars Grammel

Introduction | Quick Install | Usage | Documentation | Examples | Contributing | modelfusion.dev

Introduction

ModelFusion is a TypeScript library for building AI applications, chatbots, and agents.

  • Vendor-neutral: ModelFusion is a non-commercial open source project that is community-driven. You can use it with any supported provider.
  • Multi-modal: ModelFusion supports a wide range of models including text generation, image generation, vision, text-to-speech, speech-to-text, and embedding models.
  • Streaming: ModelFusion supports streaming for many generation models, e.g. text streaming, structure streaming, and full duplex speech streaming.
  • Utility functions: ModelFusion provides functionality for tools and tool usage, vector indices, and guards functions.
  • Type inference and validation: ModelFusion infers TypeScript types wherever possible and to validates model responses.
  • Observability and logging: ModelFusion provides an observer framework and out-of-the-box logging support.
  • Resilience and Robustness: ModelFusion ensures seamless operation through automatic retries, throttling, and error handling mechanisms.
  • Server: ModelFusion provides a Fastify plugin that exposes a ModelFusion flow as a REST endpoint that uses server-sent events.

Note

ModelFusion is in its initial development phase. The main API is now mostly stable, but until version 1.0 there may be breaking changes. Feedback and suggestions are welcome.

Quick Install

npm install modelfusion

Or use a template:

Usage Examples

Tip

The basic examples are a great way to get started and to explore in parallel with the documentation. You can find them in the examples/basic folder.

You can provide API keys for the different integrations using environment variables (e.g., OPENAI_API_KEY) or pass them into the model constructors as options.

Generate text using a language model and a prompt. You can stream the text if it is supported by the model. You can use images for multi-modal prompting if the model supports it (e.g. with llama.cpp). You can use prompt templates to change the prompt template of a model.

generateText

import { generateText, openai } from "modelfusion";

const text = await generateText(
  openai.CompletionTextGenerator({ model: "gpt-3.5-turbo-instruct" }),
  "Write a short story about a robot learning to love:\n\n"
);

Providers: OpenAI, OpenAI compatible, Llama.cpp, Ollama, Hugging Face, Cohere, Anthropic

streamText

import { streamText, openai } from "modelfusion";

const textStream = await streamText(
  openai.CompletionTextGenerator({ model: "gpt-3.5-turbo-instruct" }),
  "Write a short story about a robot learning to love:\n\n"
);

for await (const textPart of textStream) {
  process.stdout.write(textPart);
}

Providers: OpenAI, OpenAI compatible, Llama.cpp, Ollama, Cohere, Anthropic

streamText with multi-modal prompt

Multi-modal vision models such as GPT 4 Vision can process images as part of the prompt.

import { streamText, openai } from "modelfusion";

const textStream = await streamText(
  openai.ChatTextGenerator({ model: "gpt-4-vision-preview" }),
  [
    OpenAIChatMessage.user([
      { type: "text", text: "Describe the image in detail:" },
      { type: "image", base64Image: image, mimeType: "image/png" },
    ]),
  ]
);

Providers: OpenAI, OpenAI compatible, Llama.cpp

Generate an image from a prompt.

import { generateImage, openai } from "modelfusion";

const image = await generateImage(
  openai.ImageGenerator({ model: "dall-e-3", size: "1024x1024" }),
  "the wicked witch of the west in the style of early 19th century painting"
);

Providers: OpenAI (Dall·E), Stability AI, Automatic1111

Synthesize speech (audio) from text. Also called TTS (text-to-speech).

generateSpeech

generateSpeech synthesizes speech from text.

import { generateSpeech, lmnt } from "modelfusion";

// `speech` is a Buffer with MP3 audio data
const speech = await generateSpeech(
  lmnt.SpeechGenerator({
    voice: "034b632b-df71-46c8-b440-86a42ffc3cf3", // Henry
  }),
  "Good evening, ladies and gentlemen! Exciting news on the airwaves tonight " +
    "as The Rolling Stones unveil 'Hackney Diamonds,' their first collection of " +
    "fresh tunes in nearly twenty years, featuring the illustrious Lady Gaga, the " +
    "magical Stevie Wonder, and the final beats from the late Charlie Watts."
);

Providers: Eleven Labs, LMNT, OpenAI

streamSpeech

generateSpeech generates a stream of speech chunks from text or from a text stream. Depending on the model, this can be fully duplex.

import { streamSpeech, elevenlabs } from "modelfusion";

const textStream: AsyncIterable<string>;

const speechStream = await streamSpeech(
  elevenlabs.SpeechGenerator({
    model: "eleven_turbo_v2",
    voice: "pNInz6obpgDQGcFmaJgB", // Adam
    optimizeStreamingLatency: 1,
    voiceSettings: { stability: 1, similarityBoost: 0.35 },
    generationConfig: {
      chunkLengthSchedule: [50, 90, 120, 150, 200],
    },
  }),
  textStream
);

for await (const part of speechStream) {
  // each part is a Buffer with MP3 audio data
}

Providers: Eleven Labs

Transcribe speech (audio) data into text. Also called speech-to-text (STT).

import { generateTranscription, openai } from "modelfusion";

const transcription = await generateTranscription(
  openai.Transcriber({ model: "whisper-1" }),
  {
    type: "mp3",
    data: await fs.promises.readFile("data/test.mp3"),
  }
);

Providers: OpenAI (Whisper), Whisper.cpp

Generate typed objects using a language model and a schema.

generateStructure

Generate a structure that matches a schema.

import { zodSchema, generateStructure, openai } from "modelfusion";

const sentiment = await generateStructure(
  // model:
  openai
    .ChatTextGenerator({
      model: "gpt-3.5-turbo",
      temperature: 0,
      maxCompletionTokens: 50,
    })
    .asFunctionCallStructureGenerationModel({ fnName: "sentiment" })
    .withInstructionPrompt(),

  // schema:
  zodSchema(
    z.object({
      sentiment: z
        .enum(["positive", "neutral", "negative"])
        .describe("Sentiment."),
    })
  ),

  // prompt:
  {
    system:
      "You are a sentiment evaluator. " +
      "Analyze the sentiment of the following product review:",
    instruction:
      "After I opened the package, I was met by a very unpleasant smell " +
      "that did not disappear even after washing. Never again!",
  }
);

Providers: OpenAI, Ollama

streamStructure

Stream a structure that matches a schema. Partial structures before the final part are untyped JSON.

import { zodSchema, openai, streamStructure } from "modelfusion";

const structureStream = await streamStructure(
  openai
    .ChatTextGenerator(/* ... */)
    .asFunctionCallStructureGenerationModel({
      fnName: "generateCharacter",
      fnDescription: "Generate character descriptions.",
    })
    .withTextPrompt(),

  zodSchema(
    z.object({
      characters: z.array(
        z.object({
          name: z.string(),
          class: z
            .string()
            .describe("Character class, e.g. warrior, mage, or thief."),
          description: z.string(),
        })
      ),
    })
  ),

  "Generate 3 character descriptions for a fantasy role playing game."
);

for await (const part of structureStream) {
  if (!part.isComplete) {
    const unknownPartialStructure = part.value;
    console.log("partial value", unknownPartialStructure);
  } else {
    const fullyTypedStructure = part.value;
    console.log("final value", fullyTypedStructure);
  }
}

Providers: OpenAI, Ollama

Create embeddings for text and other values. Embeddings are vectors that represent the essence of the values in the context of the model.

// embed single value:
const embedding = await embed(
  openai.TextEmbedder({ model: "text-embedding-ada-002" }),
  "At first, Nox didn't know what to do with the pup."
);

// embed many values:
const embeddings = await embedMany(
  openai.TextEmbedder({ model: "text-embedding-ada-002" }),
  [
    "At first, Nox didn't know what to do with the pup.",
    "He keenly observed and absorbed everything around him, from the birds in the sky to the trees in the forest.",
  ]
);

Providers: OpenAI, Llama.cpp, Ollama, Hugging Face, Cohere

Split text into tokens and reconstruct the text from tokens.

const tokenizer = openai.Tokenizer({ model: "gpt-4" });

const text = "At first, Nox didn't know what to do with the pup.";

const tokenCount = await countTokens(tokenizer, text);

const tokens = await tokenizer.tokenize(text);
const tokensAndTokenTexts = await tokenizer.tokenizeWithTexts(text);
const reconstructedText = await tokenizer.detokenize(tokens);

Providers: OpenAI, Llama.cpp, Cohere

Guard functions can be used to implement retry on error, redacting and changing reponses, etc.

Retry structure parsing on error

const result = await guard(
  (input, options) =>
    generateStructure(
      openai
        .ChatTextGenerator({
          // ...
        })
        .asFunctionCallStructureGenerationModel({
          fnName: "myFunction",
        }),
      zodSchema({
        // ...
      }),
      input,
      options
    ),
  [
    // ...
  ],
  fixStructure({
    modifyInputForRetry: async ({ input, error }) => [
      ...input,
      OpenAIChatMessage.assistant(null, {
        functionCall: {
          name: "sentiment",
          arguments: JSON.stringify(error.valueText),
        },
      }),
      OpenAIChatMessage.user(error.message),
      OpenAIChatMessage.user("Please fix the error and try again."),
    ],
  })
);

Tools are functions that can be executed by an AI model. They are useful for building chatbots and agents.

Predefined tools: SerpAPI, Google Custom Search

A tool is comprised of an async execute function, a name, a description, and a schema for the input parameters.

const calculator = new Tool({
  name: "calculator",
  description: "Execute a calculation",

  parameters: zodSchema(
    z.object({
      a: z.number().describe("The first number."),
      b: z.number().describe("The second number."),
      operator: z
        .enum(["+", "-", "*", "/"])
        .describe("The operator (+, -, *, /)."),
    })
  ),

  execute: async ({ a, b, operator }) => {
    switch (operator) {
      case "+":
        return a + b;
      case "-":
        return a - b;
      case "*":
        return a * b;
      case "/":
        return a / b;
      default:
        throw new Error(`Unknown operator: ${operator}`);
    }
  },
});

With generateToolCall, you can generate a tool call for a specific tool with a language model that supports tools calls (e.g. OpenAI Chat). This function does not execute the tools.

const { id, name, args } = await generateToolCall(
  openai.ChatTextGenerator({ model: "gpt-3.5-turbo" }),
  calculator,
  [OpenAIChatMessage.user("What's fourteen times twelve?")]
);

With generateToolCallsOrText, you can ask a language model to generate several tool calls as well as text. The model will choose which tools (if any) should be called with which arguments. Both the text and the tool calls are optional. This function does not execute the tools.

const { text, toolCalls } = await generateToolCallsOrText(
  openai.ChatTextGenerator({ model: "gpt-3.5-turbo" }),
  [toolA, toolB, toolC],
  [OpenAIChatMessage.user(query)]
);

You can directly invoke a tool with executeTool:

const result = await executeTool(calculator, {
  a: 14,
  b: 12,
  operator: "*",
});

With useTool, you can use a tool with a language model that supports tools calls (e.g. OpenAI Chat). useTool first generates a tool call and then executes the tool with the arguments.

const { tool, toolCall, args, ok, result } = await useTool(
  openai.ChatTextGenerator({ model: "gpt-3.5-turbo" }),
  calculator,
  [OpenAIChatMessage.user("What's fourteen times twelve?")]
);

console.log(`Tool call:`, toolCall);
console.log(`Tool:`, tool);
console.log(`Arguments:`, args);
console.log(`Ok:`, ok);
console.log(`Result or Error:`, result);

With useToolsOrGenerateText, you can ask a language model to generate several tool calls as well as text. The model will choose which tools (if any) should be called with which arguments. Both the text and the tool calls are optional. This function executes the tools.

const { text, toolResults } = await useToolsOrGenerateText(
  openai.ChatTextGenerator({ model: "gpt-3.5-turbo" }),
  [calculator /* ... */],
  [OpenAIChatMessage.user("What's fourteen times twelve?")]
);

You can use useToolsOrGenerateText to implement an agent loop that responds to user messages and executes tools. Learn more.

const texts = [
  "A rainbow is an optical phenomenon that can occur under certain meteorological conditions.",
  "It is caused by refraction, internal reflection and dispersion of light in water droplets resulting in a continuous spectrum of light appearing in the sky.",
  // ...
];

const vectorIndex = new MemoryVectorIndex<string>();
const embeddingModel = openai.TextEmbedder({
  model: "text-embedding-ada-002",
});

// update an index - usually done as part of an ingestion process:
await upsertIntoVectorIndex({
  vectorIndex,
  embeddingModel,
  objects: texts,
  getValueToEmbed: (text) => text,
});

// retrieve text chunks from the vector index - usually done at query time:
const retrievedTexts = await retrieve(
  new VectorIndexRetriever({
    vectorIndex,
    embeddingModel,
    maxResults: 3,
    similarityThreshold: 0.8,
  }),
  "rainbow and water droplets"
);

Available Vector Stores: Memory, SQLite VSS, Pinecone

Prompt templates let you use higher level prompt structures (such as text, instruction or chat prompts) for different models.

Text Prompt Example

const text = await generateText(
  anthropic
    .TextGenerator({
      model: "claude-instant-1",
    })
    .withTextPrompt(),
  "Write a short story about a robot learning to love"
);

Instruction Prompt Example

// example assumes you are running https://huggingface.co/TheBloke/Llama-2-7B-GGUF with llama.cpp
const text = await generateText(
  llamacpp
    .TextGenerator({
      contextWindowSize: 4096, // Llama 2 context window size
      maxCompletionTokens: 1000,
    })
    .withTextPromptTemplate(Llama2Prompt.instruction()),
  {
    system: "You are a story writer.",
    instruction: "Write a short story about a robot learning to love.",
  }
);

They can also be accessed through the shorthand methods .withTextPrompt(), .withChatPrompt() and .withInstructionPrompt() for many models:

Chat Prompt Example

const textStream = await streamText(
  openai
    .ChatTextGenerator({
      model: "gpt-3.5-turbo",
    })
    .withChatPrompt(),
  {
    system: "You are a celebrated poet.",
    messages: [
      {
        role: "user",
        content: "Suggest a name for a robot.",
      },
      {
        role: "assistant",
        content: "I suggest the name Robbie",
      },
      {
        role: "user",
        content: "Write a short story about Robbie learning to love",
      },
    ],
  }
);
Prompt Template Text Prompt Instruction Prompt Chat Prompt
OpenAI Chat
Anthropic
Llama 2
ChatML
NeuralChat
Alpaca
Vicuna
Generic Text

You an use prompt templates with image models as well, e.g. to use a basic text prompt. It is available as a shorthand method:

const image = await generateImage(
  stability
    .ImageGenerator({
      //...
    })
    .withTextPrompt(),
  "the wicked witch of the west in the style of early 19th century painting"
);
Prompt Template Text Prompt
Automatic1111
Stability

Metadata and original responses

ModelFusion model functions return rich results that include the original response and metadata when you set the returnType option to full.

// access the full response (needs to be typed) and the metadata:
const { value, response, metadata } = await generateText(
  openai.CompletionTextGenerator({
    model: "gpt-3.5-turbo-instruct",
    maxCompletionTokens: 1000,
    n: 2, // generate 2 completions
  }),
  "Write a short story about a robot learning to love:\n\n",
  { returnType: "full" }
);

console.log(metadata);

// cast to the response type:
for (const choice of (response as OpenAICompletionResponse).choices) {
  console.log(choice.text);
}

Logging and Observability

ModelFusion provides an observer framework and out-of-the-box logging support. You can easily trace runs and call hierarchies, and you can add your own observers.

Global Logging Example

import { modelfusion } from "modelfusion";

modelfusion.setLogFormat("detailed-object"); // log full events

Warning

ModelFusion Server is in its initial development phase and not feature-complete. The API is experimental and breaking changes are likely. Feedback and suggestions are welcome.

ModelFusion Server is desigend for running multi-modal generative AI flows that take up to several minutes to complete. It provides the following benefits:

  • 🔄 Real-time progress updates via custom server-sent events
  • 🔒Type-safety with Zod-schema for inputs/events
  • 📦 Efficient handling of dynamically created binary assets (images, audio)
  • 📜 Auto-logging for AI model interactions within flows

ModelFusion provides a Fastify plugin that allows you to set up a server that exposes your ModelFusion flows as REST endpoints using server-sent events.

import {
  FileSystemAssetStorage,
  FileSystemLogger,
  modelFusionFastifyPlugin,
} from "modelfusion/fastify-server"; // '/fastify-server' import path

// configurable logging for all runs using ModelFusion observability:
const logger = new FileSystemLogger({
  path: (run) => path.join(fsBasePath, run.runId, "logs"),
});

// configurable storage for large files like images and audio files:
const assetStorage = new FileSystemAssetStorage({
  path: (run) => path.join(fsBasePath, run.runId, "assets"),
  logger,
});

fastify.register(modelFusionFastifyPlugin, {
  baseUrl,
  basePath: "/myFlow",
  logger,
  assetStorage,
  flow: exampleFlow,
});

Using invokeFlow, you can easily connect your client to a ModelFusion flow endpoint:

import { invokeFlow } from "modelfusion/browser"; // '/browser' import path

invokeFlow({
  url: `${BASE_URL}/myFlow`,
  schema: myFlowSchema,
  input: { prompt },
  onEvent(event) {
    switch (event.type) {
      case "my-event": {
        // do something with the event
        break;
      }
      // more events...
    }
  },
  onStop() {
    // flow finished
  },
});

Documentation

More Examples

Examples for almost all of the individual functions and objects. Highly recommended to get started.

multi-modal, structure streaming, image generation, text to speech, speech to text, text generation, structure generation, embeddings

StoryTeller is an exploratory web application that creates short audio stories for pre-school kids.

Terminal app, chat, llama.cpp

A chat with an AI assistant, implemented as a terminal app.

Next.js app, OpenAI GPT-3.5-turbo, streaming, abort handling

A web chat with an AI assistant, implemented as a Next.js app.

terminal app, PDF parsing, in memory vector indices, retrieval augmented generation, hypothetical document embedding

Ask questions about a PDF document and get answers from the document.

Next.js app, Stability AI image generation

Create an 19th century painting image for your input.

Next.js app, OpenAI Whisper

Record audio with push-to-talk and transcribe it using Whisper, implemented as a Next.js app. The app shows a list of the transcriptions.

Speech Streaming, OpenAI, Elevenlabs streaming, Vite, Fastify, ModelFusion Server

Given a prompt, the server returns both a text and a speech stream response.

terminal app, agent, BabyAGI

TypeScript implementation of the BabyAGI classic and BabyBeeAGI.

terminal app, ReAct agent, GPT-4, OpenAI functions, tools

Get answers to questions from Wikipedia, e.g. "Who was born first, Einstein or Picasso?"

terminal app, agent, tools, GPT-4

Small agent that solves middle school math problems. It uses a calculator tool to solve the problems.

terminal app, PDF parsing, recursive information extraction, in memory vector index, _style example retrieval, OpenAI GPT-4, cost calculation

Extracts information about a topic from a PDF and writes a tweet in your own style about it.

Cloudflare, OpenAI

Generate text on a Cloudflare Worker using ModelFusion and OpenAI.

Contributing

Read the ModelFusion contributing guide to learn about the development process, how to propose bugfixes and improvements, and how to build and test your changes.