This project is a self-hosted version of OpenAI's new stateful Assistants API. 💪
All API route definitions and types are 100% auto-generated from OpenAI's official OpenAPI spec, so all it takes to switch between the official API and your custom API is changing the baseURL
. 🤯
This means that all API parameters, responses, and types are wire-compatible with the official OpenAI API, and the fact that they're auto-generated means that it will be relatively easy to keep them in sync over time.
Here's an example using the official Node.js openai
package:
import OpenAI from 'openai'
// The only difference is the `baseURL` pointing to your custom API server 🔥
const openai = new OpenAI({
baseURL: 'http://localhost:3000'
})
// Since the custom API is spec-compliant with OpenAI, you can use the sdk normally 💯
const assistant = await openai.beta.assistants.create({
model: 'gpt-4-1106-preview',
instructions: 'You are a helpful assistant.'
})
Python example
Here's the same example using the official Python openai
package:
from openai import OpenAI
client = OpenAI(
base_url: "http://localhost:3000"
)
# Now you can use the sdk normally!
# (only file and beta assistant resources are currently supported)
# You can even switch back and forth between the official and custom APIs!
assistant = client.beta.assistants.create(
model="gpt-4-1106-preview",
description="You are a helpful assistant."
)
Note that this project is not meant to be a full recreation of the entire OpenAI API. Rather, it is focused only on the stateful portions of the new Assistants API. The following resource types are supported:
- Assistants
- AssistantFiles
- Files
- Messages
- MessageFiles
- Threads
- Runs
- RunSteps
See the official OpenAI Assistants Guide for more info on how Assistants work.
Being able to run your own, custom OpenAI Assistants that are 100% compatible with the official OpenAI Assistants unlocks all sorts of useful possibilities:
- Using OpenAI Assistants with custom models (OSS ftw!) 💪
- Fully customizable RAG via the built-in retrieval tool (LangChain and LlamaIndex integrations coming soon)
- Using a custom code interpreter like open-interpreter 🔥
- Self-hosting / on-premise deployments of Assistants
- Full control over assistant evals
- Developing & testing GPTs in fully sandboxed environments
- Sandboxed testing of custom Actions before deploying to the OpenAI "GPT Store"
Most importantly, if the OpenAI "GPT Store" ends up gaining traction with ChatGPT's 100M weekly active users, then the ability to reliably run, debug, and customize OpenAI-compatible Assistants will end up being incredibly important in the future.
I could even imagine a future Assistant store which is fully compatible with OpenAI's GPTs, but instead of relying on OpenAI as the gatekeeper, it could be fully or partially decentralized. 💯
- Postgres - Primary datastore via Prisma (schema file)
- Redis - Backing store for the async task queue used to process thread runs via BullMQ
- S3 - Stores uploaded files
- Any S3-compatible storage provider is supported, such as Cloudflare R2
- Hono - Serves the REST API via @hono/zod-openapi
- We're using the Node.js adaptor by default, but Hono supports many environments including CF workers, Vercel, Netlify, Deno, Bun, Lambda, etc.
- Dexter - Production RAG by Dexa
- TypeScript 💕
Prerequisites:
Install deps:
pnpm install
Generate the prisma types locally:
pnpm generate
cp .env.example .env
- Postgres
DATABASE_URL
- Postgres connection string- On macOS:
brew install postgresql && brew services start postgresql
- You'll need to run
npx prisma db push
to set up your database according to our prisma schema
- OpenAI
OPENAI_API_KEY
- OpenAI API key for running the underlying chat completion calls- This is required for now, but depending on how interested people are, it won't be hard to add support for local models and other providers
- Redis
- On macOS:
brew install redis && brew services start redis
- If you have a local redis instance running, the default redis env vars should work without touching them
REDIS_HOST
- Optional; defaults tolocalhost
REDIS_PORT
- Optional; defaults to6379
REDIS_USERNAME
- Optional; defaults todefault
REDIS_PASSWORD
- Optional
- On macOS:
- S3 - Required to use file attachments
- Any S3-compatible provider is supported, such as Cloudflare R2
- Alterantively, you can use a local S3 server like MinIO or LocalStack
- To run LocalStack on macOS:
brew install localstack/tap/localstack-cli && localstack start -d
- To run MinIO macOS:
brew install minio/stable/minio && minio server /data
- To run LocalStack on macOS:
- I recommend using Cloudflare R2, though – it's amazing and should be free for most use cases!
S3_BUCKET
- RequiredS3_REGION
- Optional; defaults toauto
S3_ENDPOINT
- Required; example:https://<id>.r2.cloudflarestorage.com
ACCESS_KEY_ID
- Required (cloudflare R2 docs)SECRET_ACCESS_KEY
- Required (cloudflare R2 docs)
The app is composed of two services: a RESTful API server and an async task runner. Both services are stateless and can be scaled horizontally.
There are two ways to run these services locally. The quickest way is via tsx
:
# Start the REST API server in one shell
npx tsx src/server
# Start an async task queue runner in another shell
npx tsx src/runner
Alternatively, you can transpile the source TS to JS first, which is preferred for running in production:
pnpm build
# Start the REST API server in one shell
npx tsx dist/server
# Start an async task queue runner in another shell
npx tsx dist/runner
This example contains an end-to-end assistant script which uses a custom get_weather
function.
You can run it using the official openai client for Node.js against the default OpenAI API hosted at https://api.openai.com/v1
.
npx tsx e2e
To run the same test suite against your local API, you can run:
OPENAI_API_BASE_URL='http://127.0.0.1:3000' npx tsx e2e
It's pretty cool to see both test suites running the exact same Assistants code using the official OpenAI Node.js client – without any noticeable differences between the two versions. Huzzah! 🥳
This example contains an end-to-end assistant script which uses the built-in retrieval
tool with this readme.md
file as an attachment.
You can run it using the official openai client for Node.js against the default OpenAI API hosted at https://api.openai.com/v1
.
npx tsx e2e/retrieval.ts
To run the same test suite against your local API, you can run:
OPENAI_API_BASE_URL='http://127.0.0.1:3000' npx tsx e2e/retrieval.ts
The output will likely differ slightly due to differences in OpenAI's built-in retrieval implementation and our default, naive retrieval implementation.
Note that the current retrieval
implementation only support text files like text/plain
and markdown, as no preprocessing or conversions are done at the moment. We also use a very naive retrieval method at the moment which always returns the full file contents as opposed to pre-processing them and only returning the most semantically relevant chunks. See this issue for more info.
GET /files
POST /files
DELETE /files/:file_id
GET /files/:file_id
GET /files/:file_id/content
GET /assistants
POST /assistants
GET /assistants/:assistant_id
POST /assistants/:assistant_id
DELETE /assistants/:assistant_id
GET /assistants/:assistant_id/files
GET /assistants/:assistant_id/files
POST /assistants/:assistant_id/files
DELETE /assistants/:assistant_id/files/:file_id
GET /assistants/:assistant_id/files/:file_id
POST /threads
GET /threads/:thread_id
POST /threads/:thread_id
DELETE /threads/:thread_id
GET /threads/:thread_id/messages
POST /threads/:thread_id/messages
GET /threads/:thread_id/messages/:message_id
POST /threads/:thread_id/messages/:message_id
GET /threads/:thread_id/messages/:message_id/files
GET /threads/:thread_id/messages/:message_id/files/:file_id
GET /threads/:thread_id/runs
POST /threads/runs
POST /threads/:thread_id/runs
GET /threads/:thread_id/runs/:run_id
POST /threads/:thread_id/runs/:run_id
POST /threads/:thread_id/runs/:run_id/submit_tool_outputs
POST /threads/:thread_id/runs/:run_id/cancel
GET /threads/:thread_id/runs/:run_id/steps
GET /threads/:thread_id/runs/:run_id/steps/:step_id
GET /openapi
You can view the server's auto-generated openapi spec by running the server and then visiting http://127.0.0.1:3000/openapi
Status: All API routes have been tested side-by-side with the official OpenAI API and are working as expected. The only missing features at the moment are support for the built-in code_interpreter
tool (issue) and support for non-text files with the built-in retrieval
tool (issue). All other functionality should be fully supported and wire-compatible with the official API.
TODO:
- hosted demo (bring your own OpenAI API key?)
- get hosted redis working
- handle locking thread and messages
- not sure how this works exactly, but according to the OpenAI Assistants Guide, threads are locked while runs are being processed
- built-in
code_interpreter
tool (issue) - support non-text files w/ built-in
retrieval
tool (issue) - openai uses prefix IDs for its resources, which would be great, except it's a pain to get working with Prisma (issue)
- figure out why localhost resolution wasn't working for #6
- handle context overflows (truncation for now)
MIT © Travis Fischer
If you found this project useful, please consider sponsoring me or following me on twitter