Satellite Scene Classification using a Pre-trained Vision Transformer

This project fine-tune a pretrained Vision Transformer (ViT) model on the NWPU-RESISC45 dataset for scene classification using the HuggingFace Transformers library.

Installation

Clone this repository:

git clone <https://github.com/aj1365/ViT_Finetuning>

Install the required packages:
```
pip install -r requirements.txt
```
Download the NWPU-RESISC45 dataset from Kaggle and place it in your preferred directory:

https://www.kaggle.com/datasets/aqibrehmanpirzada/nwpuresisc45

Usage

Run the training script with command line arguments to specify settings:

--data_dir: Path to the NWPU-RESISC45 dataset directory.

--batch_size: Batch size for training.

--eval_batch_size: Batch size for evaluation.

--epochs: Number of training epochs.

python main.py --data_dir /path/to/dataset --batch_size 16 --eval_batch_size 16 --epochs 10