captioning

There are 81 repositories under captioning topic.

facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Language:Python5.6k 110 657941
roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language:Python2.6k 34 44216
fpgaminer/joycaption
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
Language:Jupyter Notebook830 7 3047
ltguo19/VSUA-Captioning
Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
Language:Python258 15 1724
DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Language:Python192 4 1721
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Language:Python115 2 36
HaydenFaulkner/Tennis
A Tennis dataset and models for event detection & commentary generation
Language:Python109 5 518
mitvis/vistext
VisText is a benchmark dataset for semantically rich chart captioning.
Language:Jupyter Notebook95 6 96
drethage/fully-convolutional-point-network
Fully-Convolutional Point Networks for Large-Scale Point Clouds
Language:Python86 13 822
Chen-Yang-Liu/Awesome-RS-Temporal-VLM
Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey
82 2 14
audio-captioning/clotho-dataset
Python code for handling the Clotho dataset.
Language:Python81 4 315
Mauville/MedCLIP
Medical image captioning using OpenAI's CLIP
Language:Jupyter Notebook72 4 716
wangleihitcs/MedicalReportGeneration
A Base Tensorflow Project for Medical Report Generation
Language:Python71 3 618
ParitoshParmar/MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Language:Python69 3 515
aimagelab/pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Language:Python64 5 79
TheShadow29/VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
Language:Python59 2 218
42lux/CaptainCaption
A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.
Language:Python58 2 39
Labbeti/aac-metrics
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
Language:Python44 2 113
lucidrains/AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Language:Python43 2 05
DavidMChan/caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
Language:Python40 2 34
audio-captioning/dcase-2020-baseline
Audio captioning baseline system for DCASE 2020 challenge.
Language:Python38 2 1211
deepgram-devs/video-chat
Sample app to display live captioning to a WebRTC video session with the Deepgram API.
Language:JavaScript37 13 114
CurryYuan/X-Trans2Cap
[CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Language:Python34 3 113
aimagelab/camel
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
Language:Python29 4 1212
RyanLiut/awesome-diverse-captioning
Some papers about *diverse* image (a few videos) captioning
26 4 03
alecwangcq/show-attend-and-tell
Language:Jupyter Notebook25 2 911
ebu/ebu-tt-live-toolkit
Toolkit for supporting the EBU-TT Live specification
Language:Python25 9 35710
elbayadm/PaperNotes
My notes on some Deep Learning papers
Language:HTML24 4 04
FeiElysia/awesome-zero-shot-captioning
A curated list of zero-shot captioning papers
24 2 11
AdrianHsu/S2VT-seq2seq-video-captioning-attention
S2VT (seq2seq) video captioning with bahdanau & luong attention implementation in Tensorflow
Language:Python19 3 410
aimagelab/PMA-Net
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023
Language:Python17 8 52
audio-captioning/caption-evaluation-tools
Tools for the evaluation of audio captioning.
Language:Jupyter Notebook16 4 12
hassanhub/R3Transformer
Official python implementation of R3-Transformer
Language:Python15 3 00
rayandrew/indonesian-image-captioning
Indonesian Image Captioning using Attention-based Semantic Compositional Networks
Language:Jupyter Notebook14 3 05
ImKeTT/ZeroGen
[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
Language:Python12 1 30
nssharmaofficial/reddit-hole
Automated reddit scraper and video creator
Language:Python12 2 62

captioning

facebookresearch/mmf

roboflow/maestro

fpgaminer/joycaption

ltguo19/VSUA-Captioning

DavidHuji/CapDec

Labbeti/aac-datasets

HaydenFaulkner/Tennis

mitvis/vistext

drethage/fully-convolutional-point-network

Chen-Yang-Liu/Awesome-RS-Temporal-VLM

audio-captioning/clotho-dataset

Mauville/MedCLIP

wangleihitcs/MedicalReportGeneration

ParitoshParmar/MTL-AQA

aimagelab/pacscore

TheShadow29/VidSitu

42lux/CaptainCaption

Labbeti/aac-metrics

lucidrains/AoA-pytorch

DavidMChan/caption-by-committee

audio-captioning/dcase-2020-baseline

deepgram-devs/video-chat

CurryYuan/X-Trans2Cap

aimagelab/camel

RyanLiut/awesome-diverse-captioning

alecwangcq/show-attend-and-tell

ebu/ebu-tt-live-toolkit

elbayadm/PaperNotes

FeiElysia/awesome-zero-shot-captioning

AdrianHsu/S2VT-seq2seq-video-captioning-attention

aimagelab/PMA-Net

audio-captioning/caption-evaluation-tools

hassanhub/R3Transformer

rayandrew/indonesian-image-captioning

ImKeTT/ZeroGen

nssharmaofficial/reddit-hole