Line Chatbot with OpenAI API

The project is a Line bot application integrated with the OpenAI API. It is designed to accompany my girlfriend while I am serving in the mandatory military service in Taiwan (R.O.C).

The project involves fine-tuning a GPT-3.5-Turbo model to simulate my chatting style.

Environment

Setting up an Anaconda Environment in Ubuntu 20.04

Create a conda env

conda create -n bot python==3.8
conda activate bot
pip install -r requirements

Set the env variable, the api key need to obtain from Linebot and OpenAI official website

export LINE_CHANNEL_ACCESS_TOKEN={LINE_CHANNEL_ACCESS_TOKEN}
export LINE_CHANNEL_SECRET={LINE_CHANNEL_SECRET}
export OPENAI_API_KEY={OPENAI_API_KEY}

Test app.py is working w/o errors.
```
python app.py
```

Setting up Nginx and TLS certification

Install nginx

sudo apt update
sudo apt install nginx openssl

Install certbot for TLS (https)

sudo apt install certbot python3-certbot-nginx

Set the location below `# Default server configuration as follows:

server_name {your domain name}
location / {
        proxy_pass http://127.0.0.1:8082;
        proxy_set_header Host $host;
    }

Set the server ip to your domain (Google Domains, etc.)
sudo certbot -d {your domain name}

Modify the nginx configuration

sudo vim /etc/nginx/site-enabled/default

Reload nginx: sudo nginx -t && sudo nginx -s reload

Fine-tune an OpenAI Model

Create training file mydata.jsonl which follow the format described from example-format
Run data_formatting.py which is provided by OpenAI to valid the training file

The result looks like as follows:

No errors found                                                 
Num examples missing system message: 0                                                                                                                                                         
Num examples missing user message: 0                                                                                                                                                           

#### Distribution of num_messages_per_example:
min / max: 3, 5
mean / median: 3.3035714285714284, 3.0
p5 / p95: 3.0, 5.0

#### Distribution of num_total_tokens_per_example:
min / max: 181, 280
mean / median: 216.23214285714286, 214.0
p5 / p95: 190.0, 248.5

#### Distribution of num_assistant_tokens_per_example:
min / max: 4, 75
mean / median: 23.714285714285715, 21.0
p5 / p95: 9.0, 44.5

0 examples may be over the 4096 token limit, they will be truncated during fine-tuning
Dataset has ~12109 tokens that will be charged for during training
By default, you'll train for 3 epochs on this dataset
By default, you'll be charged for ~36327 tokens
See pricing page to estimate total costs

RUN finetune.py

Specify the Model on `app.py`

We can use the gpt-3.5-turbo which is provided by OpenAI official as default or ours fine-tuned model.
```
# L: 15
model = 'gpt-3.5-turbo'
```

Execute the LineBot client

RUN app.py

gary9987/line-chatbot-with-openai-api