/MiniGPT4-image-caption-generation

Streamline the creation of supervised datasets to facilitate data augmentation for deep learning architectures focused on image captioning. The core framework leverages MiniGPT-4, complemented by the pre-trained Vicuna model, which boasts 13 billion parameters.

Primary LanguagePython

Image captioning generation using MiniGPT-4 and Vicuna pre-trained model

PyTorch PythonAnywhere Shell Script

Description

This repository constitutes an implementation of an image captioner for large datasets, aiming to streamline the creation process of supervised datasets to aid in the data augmentation procedure for image captioning deep learning architectures.

The foundational framework utilized is the MiniGPT-4, supplemented by the pre-trained Vicuna model boasting 13 billion parameters.

Pre-requisite

You must have a GPU-enabled machine with a memory capacity of at least 23 GB.

Getting Started

Installation

git clone https://github.com/neemiasbsilva/MiniGPT-4-image-caption-implementation.git
git clone https://github.com/Vision-CAIR/MiniGPT-4.git
cd MiniGPT-4
conda env create -f environment.yml
conda activate minigptv
conda install pandas
mv MiniGPT-4/* ../.

Setup the shell script

In the shell file (run.sh) you have to specify:

  • data_path: the path where your image dataset are.
  • beam_search: hyperparameter that is a range 0 to 10;
  • temperature: hyperparameter (between 0.1 to 1.0);
  • save_path: local you have to save your caption data set.

Setup pre-trained models

  • Download the Vicuna 13 B

  • Set the LLM path minigpt4/configs/models/minigpt4_vicuna0.yaml in Line 15.

    llama_model: "vicuna"
    
  • Download the MiniGPT-4 Checkpoint Model

  • Set the LLM path eval_configs/minigpt4_eval.yaml in Line 8.

    ckpt: pretrained_minigpt4.pth
    

Usage

sh run.sh