Vision GPT2

Vision GPT2 is an image captioning model based on the GPT-2 architecture. This project includes scripts for training the model, testing it on images, running an API server, and evaluating its performance using metrics such as BLEU and ROUGE scores.

Project Structure
Installation
Usage
Contributors

Project Structure

Description of Components

train.py: This script is used to train the Vision GPT2 model using specified training data. It fine-tunes the model to improve accuracy on the target dataset.
test.py: Utilize this script to test the image captioning capabilities of the model on either a single image or multiple images. It supports evaluating both the fine-tuned and the pre-trained models.
infer_vision_gpt2.py: This script facilitates running inference tasks using pre-trained models from the Hugging Face repository, adapting them to your specific use-case without the need for local training.
run_api.py: Sets up a simple API server that allows image captioning models to be served over a network, making the model accessible for real-time applications.
metrics.py: Contains functions to compute the BLEU and ROUGE scores which are essential for evaluating the linguistic quality of the captions generated by the model.

Installation

To install the required dependencies for running the scripts environment.yml

Usage

Training a model

run python train.py

Testing a model

run python test.py set model path and model name inside the script

Running the API Server

run python -m uvicorn run_api:app --host 127.0.0.1 --port 8000 --reload

Evaluating the Model

run python metrics.py in this file give the path of the file which you want to evalute

Contributors

Abhishek
Rohit
Avinash
Parul
Arya

29rabhishek/vision_gpt2