Vision GPT2 is an image captioning model based on the GPT-2 architecture. This project includes scripts for training the model, testing it on images, running an API server, and evaluating its performance using metrics such as BLEU and ROUGE scores.
-
train.py: This script is used to train the Vision GPT2 model using specified training data. It fine-tunes the model to improve accuracy on the target dataset.
-
test.py: Utilize this script to test the image captioning capabilities of the model on either a single image or multiple images. It supports evaluating both the fine-tuned and the pre-trained models.
-
infer_vision_gpt2.py: This script facilitates running inference tasks using pre-trained models from the Hugging Face repository, adapting them to your specific use-case without the need for local training.
-
run_api.py: Sets up a simple API server that allows image captioning models to be served over a network, making the model accessible for real-time applications.
-
metrics.py: Contains functions to compute the BLEU and ROUGE scores which are essential for evaluating the linguistic quality of the captions generated by the model.
To install the required dependencies for running the scripts environment.yml
run python train.py
run python test.py set model path and model name inside the script
run python -m uvicorn run_api:app --host 127.0.0.1 --port 8000 --reload
run python metrics.py in this file give the path of the file which you want to evalute
- Abhishek
- Rohit
- Avinash
- Parul
- Arya