This project fine-tune a pretrained Vision Transformer (ViT) model on the NWPU-RESISC45 dataset for scene classification using the HuggingFace Transformers library.
-
Clone this repository:
git clone <https://github.com/aj1365/ViT_Finetuning>
-
Install the required packages:
pip install -r requirements.txt
-
Download the NWPU-RESISC45 dataset from Kaggle and place it in your preferred directory:
https://www.kaggle.com/datasets/aqibrehmanpirzada/nwpuresisc45
Run the training script with command line arguments to specify settings:
--data_dir: Path to the NWPU-RESISC45 dataset directory.
--batch_size: Batch size for training.
--eval_batch_size: Batch size for evaluation.
--epochs: Number of training epochs.
python main.py --data_dir /path/to/dataset --batch_size 16 --eval_batch_size 16 --epochs 10