built using Shark-NLP's OpenICL framework
Can we beat the ground truth using soft label during In-Context Learning? This repository helps users to answer this question by allowing them to create soft labels on dataset of their own choice, and distill this soft label information to the student model during in-context learning.
make poetry-install
if building faiss index is too slow in your gpu,
make poetry-faiss-gpu-reinstall
- Download the
json
format data into an appropriatedata
folder - Create a config file under
config/data
following the template (More information on template fields) - Run below command
make run-create_train SETUP_DICT="config/data/<config_file_name>.json"
- Use
data_utils/generated_train_dist.ipynb
to get dataset statistics
Currently supports: sst2, sst5, trec, ag_news, yelp, qnli, mnli
# to pretrain BERT
make train-bert dataset="<dataset_name>"
# create train data using pretrained BERT
make infer-bert checkpoint_path="<path_to_ckpt>" dataset="<dataset_name>" file_name="<output_file_name>"
- Create a config file under
config/distill
following the template (More information on template fields) - Run below command
make run-distill SETUP_DICT="config/distill/<config_file_name>.json"
.txt
file with accuracies and.png
file of corrresponding plots will be saved as artifacts