IMDb Sentiment Classification with DistilBERT

This project uses the DistilBERT model to perform sentiment classification on the IMDb dataset. The script fine-tunes the pre-trained DistilBERT model using the IMDb dataset and evaluates its performance.

Setup

Prerequisites

Python 3.8 or later
An NVIDIA GPU with CUDA installed (optional, but recommended for faster training)

Installation

Clone the repository:

git clone https://github.com/your-repository/imdb-sentiment-classification.git
cd imdb-sentiment-classification

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Training and Evaluation

To train and evaluate the model, run the following command:

python tfm_classifier.py

This script will:

Load and preprocess the IMDb dataset.
Fine-tune the DistilBERT model.
Evaluate the model on the test set.
Save the fine-tuned model and tokenizer.

Requirements

torch: PyTorch for model training and inference.
transformers: Hugging Face Transformers library for using the DistilBERT model.
datasets: Hugging Face Datasets library for loading and processing the IMDb dataset.
pandas: Data manipulation library.

Notes:

Replace your-repository with the actual repository URL if you have one.
Make sure the tfm_classifier.py script contains the training and evaluation code provided earlier.

This README.md provides comprehensive instructions on setting up, running, and using the project, ensuring clarity for any users or contributors.

License

This project is licensed under the MIT License. See the LICENSE file for details.

julicq/TFM-Classifier