This repository is based on DeepSpeed-Chat. The main changes are in /applications.
The general steps are:
Since we have the raw dataset: integrate separate zip files into a holistic text file -> python unzip_scrip.py
Filter games with "SHOWDOWN" -> python order_change.py
Filter good players By tuning parameters, we can filter players of different levels -> python good_players.py
conduct prompt engineering on filtered dataset -> python prompt_engieering3.py
The final prompt files are with the name started with "prompt_"
Supervised fine-tuning -> reward modeling -> RLHF More operations on alternate pre-trained models and hyperparameter tuning can be referred in: https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat
Also, you should overwrite the "training" folder of the original DeepSpeed-Chat pipeline. Due to our privacy, we have not released our data and model yet.