/Dataset-Pruning-Sentiment-Analysis

Testing of model-agnostic metrics in data pruning for fine-tuning sentiment analysis models

Primary LanguageJupyter Notebook

Dataset-Pruning-Sentiment-Analysis

Code for Dataset Pruning for Sentiment Analysis Fine-Tuning, which explores the pruning metrics of sentence length, number of clusters, and mean distance in GloVE space for pruning fine-tuning datasets for sentiment analysis tasks.

Model-Agnostic Dataset Pruning for Sentiment Analysis Fine-Tuning.pdf: ACL-style report of project pruning.ipynb: Python Notebook of Pruning and Experiment Code

To run the code:

  1. Download and unzip glove.6B.zip from https://nlp.stanford.edu/projects/glove/
  2. Add the file glove.6B.50d.txt to your working directory (i.e., the directory with the Python Notebook).
  3. Optional: Get an access token from HuggingFace to login through the notebook.
  4. If running in Colab, set FOLDER_NAME in the first cell of the Python Notebook to your working directory.
  5. Run the cells of the Notebook.