With this we aspire to refine the balance between computational efficiency and model performance.
This endeavour will explore the compute-memory trade-off, aiming to establish a set of best practices for developing highly efficient, accurate Large Language models.
finetune-lora.py -> This is the main file to start with.
local_dataset_utilities.py and local_model_utilities.py are utility files.
gridsearch.py -> This file contains the code for grid search
grid_search_results.txt -> This file contains the result of grid search
The plots directory consists of graphs of accuracy and loss
DistilRoberta directory contains the code of our attempt to finetune DistilRoberta model on Financial Sentiment dataset.
This code needs to be run on GPU.
To run on a GPU enabled system, run finetune-lora.py with appropriate command line arguments.
To run on HPC -
run following batch file - run_DistilBert_gpu.SBATCH with appropriate command line arguments
following is the list of command line arguments -
--q_lora boolean -> To enable QLoRA or not
--lora_r int -> Rank for LoRA layers
--lora_alpha int -> Alpha for LoRA layers
--lora_query boolean -> Apply LoRA to query
--lora_key boolean -> Apply LoRA to key
--lora_value boolean -> Apply LoRA to value
--lora_projection boolean -> Apply LoRA to projection layer
--lora_mlp booealn -> Apply LoRA to MLP
--lora_head boolean -> Apply LoRA to head
Plots are added in the plots directory of this repository.
Profiler results can be found in the log folder inside the log files
- Finetuning using LoRA (r=8,alpha=16), Test Accuracy - 89.71%
- Finetuning without LoRA (selective finetuning), Test Accuracy - 87.39%
- Finetuning without LoRA (full finetuning), Test Accuracy - 91.59%
- QLoRA, Test Accuracy - 91.99%
- LoRA with ADAM, Test Accuracy - 92.39%
- LoRA with SGD, Test Accuracy - 66.45%
- LoRA with SGD and Nesterov, Test Accuracy - 91.46%
- LoRA with LinearLR Schedular, Test Accuracy - 92.44%
- LoRA with OneCycleLR Schedular, Test Accuracy - 50.06%