The project is a Line bot application integrated with the OpenAI API. It is designed to accompany my girlfriend while I am serving in the mandatory military service in Taiwan (R.O.C).
- The project involves fine-tuning a GPT-3.5-Turbo model to simulate my chatting style.
- Create a conda env
conda create -n bot python==3.8 conda activate bot pip install -r requirements
- Set the env variable, the api key need to obtain from Linebot and OpenAI official website
export LINE_CHANNEL_ACCESS_TOKEN={LINE_CHANNEL_ACCESS_TOKEN} export LINE_CHANNEL_SECRET={LINE_CHANNEL_SECRET} export OPENAI_API_KEY={OPENAI_API_KEY}
- Test
app.py
is working w/o errors.python app.py
-
Install nginx
sudo apt update sudo apt install nginx openssl
-
Install certbot for TLS (https)
sudo apt install certbot python3-certbot-nginx
-
Set the
location
below `# Default server configuration as follows:server_name {your domain name} location / { proxy_pass http://127.0.0.1:8082; proxy_set_header Host $host; }
-
Set the server ip to your domain (Google Domains, etc.)
-
sudo certbot -d {your domain name}
-
Modify the nginx configuration
sudo vim /etc/nginx/site-enabled/default
-
Reload nginx:
sudo nginx -t && sudo nginx -s reload
- Create training file
mydata.jsonl
which follow the format described from example-format - Run
data_formatting.py
which is provided by OpenAI to valid the training file - The result looks like as follows:
No errors found Num examples missing system message: 0 Num examples missing user message: 0 #### Distribution of num_messages_per_example: min / max: 3, 5 mean / median: 3.3035714285714284, 3.0 p5 / p95: 3.0, 5.0 #### Distribution of num_total_tokens_per_example: min / max: 181, 280 mean / median: 216.23214285714286, 214.0 p5 / p95: 190.0, 248.5 #### Distribution of num_assistant_tokens_per_example: min / max: 4, 75 mean / median: 23.714285714285715, 21.0 p5 / p95: 9.0, 44.5 0 examples may be over the 4096 token limit, they will be truncated during fine-tuning Dataset has ~12109 tokens that will be charged for during training By default, you'll train for 3 epochs on this dataset By default, you'll be charged for ~36327 tokens See pricing page to estimate total costs
- RUN
finetune.py
- We can use the
gpt-3.5-turbo
which is provided by OpenAI official as default or ours fine-tuned model.# L: 15 model = 'gpt-3.5-turbo'
- RUN
app.py