Code for Dataset Pruning for Sentiment Analysis Fine-Tuning, which explores the pruning metrics of sentence length, number of clusters, and mean distance in GloVE space for pruning fine-tuning datasets for sentiment analysis tasks.
Model-Agnostic Dataset Pruning for Sentiment Analysis Fine-Tuning.pdf
: ACL-style report of project
pruning.ipynb
: Python Notebook of Pruning and Experiment Code
To run the code:
- Download and unzip
glove.6B.zip
from https://nlp.stanford.edu/projects/glove/ - Add the file
glove.6B.50d.txt
to your working directory (i.e., the directory with the Python Notebook). - Optional: Get an access token from HuggingFace to login through the notebook.
- If running in Colab, set
FOLDER_NAME
in the first cell of the Python Notebook to your working directory. - Run the cells of the Notebook.