OpenAI API mock / proxy Worker

There's now official Cloudflare (Workers) AI support for OpenAI comptabile API endpoints, I recommend using this instead: OpenAI compatible API endpoints

This project aims to convert/proxy Cloudflare Workers AI responses to OpenAI API compatible responses so that Cloudflare Workers AI models can be used with any OpenAI/ChatGPT compatible client.

Supports streaming and non-streaming responses
Rewrites default models such as gpt-3 and gpt-4 to use @cf/meta/llama-3-8b-instruct
If the OpenAI client can be configured to use other model names, simply replace gpt-4 with the Cloudflare model ID
Here's a list of all Cloudflare Workers AI models

installation

create a Cloudflare Account
clone this repo
run npm run deploy
generate an API key and add it to your project: npx wrangler secret put token

after the script has been deployed, you'll get an URL which you can use as your OpenAI API endpoint for other applications, something like this: https://openai-api.foobar.workers.dev

examples

use with llm

I mainly created this project to make it work with the awesome LLM project from Simon Willison

go ahead and read everything about LLM here
how to install LLM
since LLM can work with OpenAI-compatible models, we're adding our OpenAI API proxy like this:

find the directory of your llm configuration: dirname "$(llm logs path)"
create this file: vi ~/Library/Application\ Support/io.datasette.llm/extra-openai-models.yaml

- model_id: cloudflare
  model_name: '@hf/thebloke/llama-2-13b-chat-awq'
  api_base: 'https://openai-api.foobar.workers.dev/'
  api_key_name: cloudflare

you can also add multiple models there:

- model_id: cfllama2
  model_name: '@cf/meta/llama-2-7b-chat-fp16'
  api_base: 'https://openai-api.foobar.workers.dev'
  api_key_name: cloudflare
- model_id: cfllama3
  model_name: '@cf/meta/llama-3-8b-instruct'
  api_base: 'https://openai-api.foobar.workers.dev'
  api_key_name: cloudflare

set the API key in LLM: llm keys set cloudflare to the one you configured in the Worker

use it with streaming (recommended):

llm chat -m cfllama3

use it without streaming:

llm chat --no-stream -m cfllama3

use with chatblade

chatblade is a cool CLI utility for ChatGPT. It was a bit harder to configure to use it with a custom endpoint and model, but this seems to work:

export OPENAI_API_KEY="your-own-auth-key" # your worker secret api key
export OPENAI_API_AZURE_ENGINE="@cf/meta/llama-3-8b-instruct" # the model you want to use
export OPENAI_API_VERSION="@cf/meta/llama-3-8b-instruct" # again, the model you want to use
export AZURE_OPENAI_ENDPOINT="https://openai-api.foobar.workers.dev/" # your workers endpoint
export OPENAI_API_TYPE=azure # I don't know why this is required

use Chatblade like this then:

chatblade -i -c "@cf/meta/llama-3-8b-instruct" # interactive mode as a chat
chatblade -c "@cf/meta/llama-3-8b-instruct" "tell a joke" # single prompt

use with Pal Chat on iOS

There's Pal Chat for iOS which can be used with custom endpoints.