forbes110/PLEDGE--Paragraph-LEvel-image-Description-GEneration

Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.

Python

No issues in this repository yet.