Image captioning generation using MiniGPT-4 and Vicuna pre-trained model

Description

This repository constitutes an implementation of an image captioner for large datasets, aiming to streamline the creation process of supervised datasets to aid in the data augmentation procedure for image captioning deep learning architectures.

The foundational framework utilized is the MiniGPT-4, supplemented by the pre-trained Vicuna model boasting 13 billion parameters.

Pre-requisite

You must have a GPU-enabled machine with a memory capacity of at least 23 GB.

Getting Started

Installation

git clone https://github.com/neemiasbsilva/MiniGPT-4-image-caption-implementation.git
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
conda activate minigptv
conda install pandas
mv MiniGPT-4/* ../.

Setup the shell script

In the shell file (run.sh) you have to specify:

data_path: the path where your image dataset are.
beam_search: hyperparameter that is a range 0 to 10;
temperature: hyperparameter (between 0.1 to 1.0);
save_path: local you have to save your caption data set.

Setup pre-trained models

Download the Vicuna 13 B
Set the LLM path minigpt4/configs/models/minigpt4_vicuna0.yaml in Line 15.
```
llama_model: "vicuna"
```
Download the MiniGPT-4 Checkpoint Model
Set the LLM path eval_configs/minigpt4_eval.yaml in Line 8.
```
ckpt: pretrained_minigpt4.pth
```

Usage

sh run.sh

neemiasbsilva/MiniGPT4-image-caption-generation