transformers-openai-api
is a server for hosting locally running NLP transformers models via the OpenAI Completions API. In short, you can run transformers
models and offer them through an API compatible with existing OpenAI tooling such as the OpenAI Python Client itself or any package that uses it (e.g. LangChain).
pip install transformers-openai-api
wget https://raw.githubusercontent.com/jquesnelle/transformers-openai-api/master/config.example.json
mv config.example.json config.json
transformers-openai-api
git clone https://github.com/jquesnelle/transformers-openai-api
cd transformers-openai-api
cp config.example.json config.json
pip install -r requirements.txt
python transformers_openai_api/
Simply set the environment variable OPENAI_API_BASE
to http://HOST:PORT/v1
before importing the openai
package. For example, to access a local instance of transformers-openai-api
, set OPENAI_API_BASE
to http://127.0.0.1:5000/v1
. Alternatively, you can set the api_base
property on the openai
object:
import openai
openai.api_base = 'http://HOST:PORT/v1'
All configuration is managed through config.json
. By default transformers-openai-api
looks for this file the in the current working directory, however a different path can be passed as the command-line argument to the program. See config.example.json.
By default the API server listens on 127.0.0.1:5000
to change this, add a HOST
and/or PORT
entries to the configuration file. For example to serve publicly:
{
"HOST": "0.0.0.0",
"PORT": 80
}
The MODELS
object handles mapping an OpenAI model name to a transformers
model configuration. The structure of a model configuration is:
Key | Description |
---|---|
ENABLED |
Boolean value to disable a model |
TYPE |
Either "Seq2Seq" or "CausalLM" |
MODEL_CONFIG |
Parameters for model creation; passed to AutoModelForTYPE.from_pretrained |
MODEL_DEVICE |
Convert model to this device; passed to to called on the created model (default cuda ) |
TOKENIZER_CONFIG |
Parameters for tokenizer creation; passed to AutoTokenizer.from_pretrained |
TOKENIZER_DEVICE |
Convert tokens to this device; passed to to called on the tokenized input (default cuda ) |
GENERATE_CONFIG |
Parameters for generation; passed to the model's generate function |
DECODE_CONFIG |
Parameters for decoding; passed to the tokenizer's decode function |
To use accelerate, set device_map
on the MODEL_CONFIG
to auto
and explicitly set MODEL_DEVICE
to null
. The default text-davinci-003
model in config.example.json is an example of this.
To switch to CPU inference, set MODEL_DEVICE
and TOKENIZER_DEVICE
to cpu
.
To use a model at half-precision, set torch_dtype
on the MODEL_CONFIG
to torch_dtype
. The disabled text-curie-001
model in config.example.json is an example of this.
To limit access to the API (i.e. enforcing OPENAI_API_KEY
), fill in the BEARER_TOKENS
object with a list of authorized tokens (e.g. your OpenAI key). If the BEARER_TOKENS
list does not exist, no authorization will be enforced.
{
"BEARER_TOKENS": ["sk-..."]
}