
Primary LanguageJupyter Notebook

This repository contains code for data preperation, training and evaluation of various classifiers on Go-Emotions Dataset.


Use this file to see how the original 28-class problem was mapped to 6-class problem.


This file implements sequence classification pipeline from HuggingFace. Additionally, there is a parameter search implementation using Ray library and HuggingFace Trainer module.


This file has Lightning modules for data preperation and training to be used with Lightning Framework. I have used 4-bit quantization from BitsAndBytes and peft library from HuggingFace, to train a 13 billion parameter model on a 16GB GPU.


This file needs to be used with previous file to run supervised finetuning under various parameter settings.


I extract the adapters from the checkpoints created from supervised_finetuning.py. This saves space because the base Llama2-13B model is fixed during supervised finetuning, and therefore doesn't need to be saved.


Here I load the adapters saved in the previous step with the Llama2-13B model and generate text from validation split of Go Emotions dataset. The module has been designed to take inputs from commandline for various checkpoints.


This is a short scipt to run OpenAI text completion API to generate labels from the validation split, in a zero-shot fashion.


Here I list the samples generated by GPT3.5-Turbo and Llama2-13B finetuned models. The model predictions can also be found in pkl format.


A notebook to visualize a pytorch model with adapter modules.


A folder with various debugging scripts and notebooks.


A file with the commands I used for supervised finetuning.