Caption Generator App

Intelligent System for generating captions from uploaded images

ML Operations:

Dataset
COCO - Common objects in conext (Microsoft)
https://cocodataset.org/#home
DL Model
Visual Encoder-Decoder Model (ViT + GPT-2)
Link: https://huggingface.co/docs/transformers/v4.29.1/en/model_doc/vision-encoder-decoder#transformers.VisionEncoderDecoderModel

Description:
The Visual Encoder-Decoder Model can be used when system provides image as input and generates a text as ouput:
IMAGE ==> TENSOR EMBEDDING ==> TEXT
Step 01: Pretrained transformer-based vision model ==> this is the encoder (ViT)
takes the IMAGE ==> TENSOR EMBEDDING
Step 02: Pretrained language model ==> this is the decoder (GPT-2)
takes the TENSOR EMBEDDING ==> TEXT

Software Development:

In development of CaptionGeneratorApp was used the next technologies:

Frontend: HTML + CSS + JQuery
Backend: Python + Django
Database: Postgres
ML-modeling: PyTorch

Database Model:

Web System Interface

Run the command $ python manage.py runserver

home.html

report.html

HoltechHard/CaptionGeneratorApp

Caption Generator App

ML Operations:

Software Development:

Database Model:

Web System Interface