OpenAI API for Google Cloud Vertex AI

Public Archive Notice

This repository has been transitioned to a public archive. While development has ceased, the codebase remains available for reference and historical purposes.

Impact on Current Users:

New features and bug fixes will no longer be implemented.
Issues and pull requests will not be reviewed or merged.
This container image will be delisted from Docker Hub in August 2024. Please make alternative arrangements before then.

Accessing the Codebase:

You can continue to clone, fork, and explore the code at your convenience.
The codebase reflects the repository's state at the time of archiving.

Staying Informed:

I recommend considering alternative projects that are actively maintained for your ongoing development needs.

Contributing:

While new contributions are no longer accepted in this repository, feel free to explore forking the codebase and creating your own derivative project.

Thank You:

I appreciate your past contributions and interest in this project. I hope the archived codebase remains a valuable resource!

This project is a drop-in replacement REST API for Vertex AI (PaLM 2, Codey, Gemini) that is compatible with the OpenAI API specifications.

Examples:

Chat with Gemini in Chatbot UI	Get help from Gemini in VSCode

This project is inspired by the idea of LocalAI but with the focus on making Google Cloud Platform Vertex AI PaLM more accessible to anyone.

A Google Cloud Run service is installed that translates the OpenAI API calls to Vertex AI (PaLM 2, Codey, Gemini).

Diagram: OpenAI, Google Cloud Run and Vertex AI

Supported OpenAI API services:

OpenAI	API	Supported
List models	`/v1/models`	✅
Chat Completions	`/v1/chat/completions`	✅
Completions (Legacy)	`/v1/completions`	❌
Embeddings	`/v1/embeddings`	❌

The software is developed in Python and based on FastAPI and LangChain.

Everything is designed to be very simple, so you can easily adjust the source code to your individual needs.

Step by Step Guide

A Jupyter notebook Vertex_AI_Chat.ipynb with step-by-step instructions is prepared. It will help you to deploy the API backend and Chatbot UI frontend as Google Cloud Run service.

Deploying to Cloud Run

Requirements:

Your user (the one used for deployment) must have proper permissions in the project. For a fast and hassle-free deployemnt the "Owner" role is recommended.

In addition, the default compute service account ([PROJECT_NR]-compute@developer.gserviceaccount.com) must have the role "Role Vertex AI User" (roles/aiplatform.user).

Authenticate:

gcloud auth login

Set default project:

gcloud config set project [PROJECT_ID]

Run the following script to create a container image and deploy that container as a public API (which allows unauthenticated calls) in Google Cloud Run:

bash deploy.sh

Note: You can change the generated fake OpenAI API key and Google Cloud region with environment variables:
export OPENAI_API_KEY="sk-XYZ"
export GOOGLE_CLOUD_LOCATION="europe-west1"
bash deploy.sh

Running Locally

The software was tested on GNU/Linux and macOS with Python 3.11 and 3.12.3 (3.12.4 currently not working). If you want to use the software under Windows, you must set the environment variables with set instead of export.

You should also create a virtual environment with the version of Python you want to use, and activate it before proceeding.

You also need the Google Cloud CLI. The Google Cloud CLI includes the gcloud command-line tool.

Initiate a Python virtual environment and install requirements:

python3 -m venv .venv && \
source .venv/bin/activate && \
pip install -r requirements.txt

Authenticate:

gcloud auth application-default login

Set default project:

gcloud auth application-default set-quota-project [PROJECT_ID]

Run with default model:

export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
uvicorn vertex:app --reload

Example for Windows:

set DEBUG=True
set OPENAI_API_KEY=sk-XYZ
uvicorn vertex:app --reload

Run with Gemini gemini-pro model:

export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
export MODEL_NAME="gemini-pro"
uvicorn vertex:app --reload

Run with Codey codechat-bison-32k model:

export DEBUG="True"
export OPENAI_API_KEY="sk-XYZ"
export MODEL_NAME="codechat-bison-32k"
export MAX_OUTPUT_TOKENS="16000"
uvicorn vertex:app --reload

The application will now be running on your local computer. You can access it by opening a web browser and navigating to the following address:

http://localhost:8000/

Usage

HTTP request and response formats are consistent with the OpenAI API.

For example, to generate a chat completion, you can send a POST request to the /v1/chat/completions endpoint with the instruction as the request body:

curl --location 'http://[ENDPOINT]/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer [API-KEY]' \
--data '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {
        "role": "user",
        "content": "Say this is a test!"
      }
    ]
  }'

Response:

{
  "id": "cmpl-efccdeb3d2a6cfe144fdde11",
  "created": 1691577522,
  "object": "chat.completion",
  "model": "gpt-3.5-turbo",
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Sure, this is a test."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Bruno API client

Download export for Bruno API client: bruno-export.json

Configuration

The configuration of the software can be done with environment variables.

The following variables with default values exist:

Variable	Default	Description
DEBUG	False	Show debug messages that help during development.
GOOGLE_CLOUD_LOCATION	us-central1	Google Cloud Platform region for API calls.
GOOGLE_CLOUD_PROJECT_ID	[DEFAULT_AUTH_PROJECT]	Identifier for your project. If not specified, the project of authentication is used.
HOST	0.0.0.0	Bind socket to this host.
MAX_OUTPUT_TOKENS	512	Token limit determines the maximum amount of text output from one prompt. Can be overridden by the end user as required by the OpenAI API specification.
MODEL_NAME	chat-bison	One of the foundation models that are available in Vertex AI.
OPENAI_API_KEY	sk-[RANDOM_HEX]	Self-generated fake OpenAI API key used for authentication against the application.
PORT	8000	Bind socket to this port.
TEMPERATURE	0.2	Sampling temperature, it controls the degree of randomness in token selection. Can be overridden by the end user as required by the OpenAI API specification.
TOP_K	40	How the model selects tokens for output, the next token is selected from.
TOP_P	0.8	Tokens are selected from most probable to least until the sum of their. Can be overridden by the end user as required by the OpenAI API specification.

OpenAI Client Library

If your application uses client libraries provided by OpenAI, you only need to modify the OPENAI_API_BASE environment variable to match your Google Cloud Run endpoint URL:

export OPENAI_API_BASE="https://openai-api-vertex-XYZ.a.run.app/v1"
python your_openai_app.py

Chatbot UI

When deploying the Chatbot UI application, the following environment variables must be set:

Variable	Value
OPENAI_API_KEY	API key generated during deployment
OPENAI_API_HOST	Google Cloud Run URL

Deploying Chatbot UI to Cloud Run

Run the following script to create a container image from the GitHub source code and deploy that container as a public website (which allows unauthenticated calls) in Google Cloud Run:

export OPENAI_API_KEY="sk-XYZ"
export OPENAI_API_HOST="https://openai-api-vertex-XYZ.a.run.app"
bash chatbot-ui.sh

Chatbox

Set the following Chatbox settings:

Setting	Value
AI Provider	OpenAI API
OpenAI API Key	API key generated during deployment
API Host	Google Cloud Run URL

VSCode-OpenAI

The VSCode-OpenAI extension is a powerful and versatile tool designed to integrate OpenAI features seamlessly into your code editor.

To activate the setup, you have two options:

either use the command "vscode-openai.configuration.show.quickpick" or
access it through the vscode-openai Status Bar located at the bottom left corner of VSCode.

Select openai.com and enter the Google Cloud Run URL with /v1 during setup.

ChatGPT Discord Bot

When deploying the Discord Bot application, the following environment variables must be set:

Variable	Value
OPENAI_API_KEY	API key generated during deployment
OPENAI_API_BASE	Google Cloud Run URL with `/v1`

ChatGPT in Slack

When deploying the ChatGPT in Slack application, the following environment variables must be set:

Variable	Value
OPENAI_API_KEY	API key generated during deployment
OPENAI_API_BASE	Google Cloud Run URL with `/v1`

ChatGPT Telegram Bot

When deploying the ChatGPT Telegram Bot application, the following environment variables must be set:

Variable	Value
OPENAI_API_KEY	API key generated during deployment
OPENAI_API_BASE	Google Cloud Run URL with `/v1`

Contributing

Have a patch that will benefit this project? Awesome! Follow these steps to have it accepted.

Please read how to contribute.
Fork this Git repository and make your changes.
Create a Pull Request.
Incorporate review feedback to your changes.
Accepted!

License

All files in this repository are under the Apache License, Version 2.0 unless noted otherwise.

Cyclenerd/google-cloud-gcp-openai-api