/CaptionGeneratorApp

Intelligent system for generating captions from images

Primary LanguageJavaScript

Caption Generator App

Intelligent System for generating captions from uploaded images

ML Operations:

  1. Dataset
    COCO - Common objects in conext (Microsoft)
    https://cocodataset.org/#home

  2. DL Model
    Visual Encoder-Decoder Model (ViT + GPT-2)
    Link: https://huggingface.co/docs/transformers/v4.29.1/en/model_doc/vision-encoder-decoder#transformers.VisionEncoderDecoderModel

vision-encoder-decoder

Description:
The Visual Encoder-Decoder Model can be used when system provides image as input and generates a text as ouput:
IMAGE ==> TENSOR EMBEDDING ==> TEXT
Step 01: Pretrained transformer-based vision model ==> this is the encoder (ViT)
takes the IMAGE ==> TENSOR EMBEDDING
Step 02: Pretrained language model ==> this is the decoder (GPT-2)
takes the TENSOR EMBEDDING ==> TEXT

Software Development:

In development of CaptionGeneratorApp was used the next technologies:

  • Frontend: HTML + CSS + JQuery
  • Backend: Python + Django
  • Database: Postgres
  • ML-modeling: PyTorch

Database Model:

image

Web System Interface

Run the command $ python manage.py runserver

home.html

image

report.html

image