Multi-modal Agent Starter

Create a cloud-hosted LLM Agent with custom personality, multi-modal tools, and memory.

This repository is designed to pair with this Agent Building Guidebook

Getting Started

You can be up and running in under a minute. A full setup walk-through is here.

For localhost development with your own IDE

Clone this repository, then set up a Python virtual environment with:

python3.8 -m venv .venv
source .venv/bin/activate
python3.8 -m pip install -r requirements.txt

To use a GitHub Dev Container in your browser:

Visit https://github.dev/steamship-core/multimodal-agent-starter, then click on the "Cloud Container" icon at lower-left and re-open in a new Docker container.

To use a GitHub Dev Container on localhost, with Docker:

Just click here:

With the proper Python environment set up and your STEAMSHIP_API_KEY environment variable set, just run:

PYTHONPATH=src python3.8 src/api.py

This project can be deployed straight to the cloud. Simply type:

ship deploy

and follow the prompts.

Tools help your agent perform actions or fetch information from the outside world. The Steamship SDK includes a large set of multi-modal & memory-aware tools you can use right away.

Your starter project already has a few tools in src/example_tools.

And you can import or find more open source tools in the Steamship SDK:

Audio Transcription:
- Assembly AI - Turns audio into text
- Whisper - Turns audio into text
- RSS Download - Returns Audio URLs from an RSS feed
Classification:
- Sentiment Analysis - Can report on the sentiment of a piece of text
- Zero Shot Classification - Can classify a piece of text
Image Generation:
- DALL-E - Generate images with DALL-E
- Stable Diffusion - Generate images with Stable Diffusion
- Google Image Search - Perform a Google Image Search and return the results
Speech Generation:
- Eleven Labs - Turn text into the spoken word
Search:
- Google Search - Find answers to questions on the web
Question Answering:
- Vector Search QA - Find answers to questions in the Steamship Vector Database
- Prompt Database QA - Find answers to questions from a pre-loaded prompt database
Text Generation:
- Image Prompt Generation - Rewrite a topic into a Stable Diffusion image prompt
- Personality Tool - Reword a response according to a particular personality
- Text Summarization - Summarize text
- Text Rewriter - Utility tool for building tools that use prompts to operate
- Translation - Translate text using an LLM
Conversation Starters:
- Knock-Knock Joke Starter - Initiate a knock knock joke. The world's most useful tool.