forbes110/PLEDGE--Paragraph-LEvel-image-Description-GEneration
Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.
Python
No issues in this repository yet.