image-captioning

There are 874 repositories under image-captioning topic.

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook10.3k 96 6911k
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language:Jupyter Notebook5.1k 31 205674
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language:Python3.2k 41 51231
sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
Language:Python2.8k 24 191721
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language:Python2.5k 20 364248
ttengwang/Caption-Anything
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything
Language:Python1.7k 15 24105
peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Language:Jupyter Notebook1.4k 25 120377
imaginary-cloud/CameraManager
Simple Swift class to provide all the configurations you need to create custom camera view in your app
Language:Swift1.4k 39 208325
NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language:Python1.3k 16 1973
microsoft/Oscar
Oscar and VinVL
Language:Python1k 25 202252
ruotianluo/self-critical.pytorch
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
Language:Python999 20 279277
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python970 28 63105
jhc13/taggui
Tag manager and captioner for image datasets
Language:Python918 15 21643
yunjey/show-attend-and-tell
TensorFlow Implementation of "Show, Attend and Tell"
Language:Jupyter Notebook906 38 86324
SkalskiP/awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Language:Python607 27 546
kdexd/virtex
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
Language:Python560 13 2961
kuanghuei/SCAN
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Language:Python556 9 61114
aimagelab/meshed-memory-transformer
Meshed-Memory Transformer for Image Captioning. CVPR 2020
Language:Python528 12 97135
subho406/OmniNet
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
Language:Python512 19 758
gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Language:Python472 6 12146
ufal/neuralmonkey
An open-source tool for sequence learning in NLP built on TensorFlow.
Language:Python412 32 396103
MahanFathi/CS231
Complete Assignments for CS231n: Convolutional Neural Networks for Visual Recognition
Language:Jupyter Notebook377 11 3152
scopeInfinity/Video2Description
Video to Text: Natural language description generator for some given video. [Video Captioning]
Language:Python342 8 2069
jiasenlu/AdaptiveAttention
Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
Language:Jupyter Notebook335 13 2574
husthuaan/AoANet
Code for paper "Attention on Attention for Image Captioning". ICCV 2019
Language:Python333 7 062
yashk2810/Image-Captioning
Image Captioning using InceptionV3 and beam search
Language:Jupyter Notebook327 12 11122
sethuiyer/Image-to-Image-Search
A reverse image search engine powered by elastic search and tensorflow
Language:Python322 13 1550
krasserm/fairseq-image-captioning
Transformer-based image captioning extension for pytorch/fairseq
Language:Python314 13 2857
dabasajay/Image-Caption-Generator
A neural network to generate captions for an image using CNN and RNN with BEAM Search.
Language:Python301 6 1981
aimagelab/show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
Language:Python282 9 3962
JDAI-CV/image-captioning
Implementation of 'X-Linear Attention Networks for Image Captioning' [CVPR 2020]
Language:Python273 4 3455
anuragmishracse/caption_generator
A modular library built on top of Keras and TensorFlow to generate a caption in natural language for any input image.
Language:Python266 19 39119
DataTurks/DataTurks
ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.
Language:JavaScript266 8 23124
saahiluppal/catr
Image Captioning Using Transformer
Language:Python262 4 2753
yxuansu/MAGIC
Language Models Can See: Plugging Visual Controls in Text Generation
Language:Python257 11 827
peteanderson80/Up-Down-Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
Language:Jupyter Notebook245 8 2869

image-captioning

salesforce/LAVIS

salesforce/BLIP

OpenGVLab/InternGPT

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

OFA-Sys/OFA

ttengwang/Caption-Anything

peteanderson80/bottom-up-attention

imaginary-cloud/CameraManager

NVlabs/prismer

microsoft/Oscar

ruotianluo/self-critical.pytorch

YehLi/xmodaler

jhc13/taggui

yunjey/show-attend-and-tell

SkalskiP/awesome-foundation-and-multimodal-models

kdexd/virtex

kuanghuei/SCAN

aimagelab/meshed-memory-transformer

subho406/OmniNet

gokayfem/ComfyUI_VLM_nodes

ufal/neuralmonkey

MahanFathi/CS231

scopeInfinity/Video2Description

jiasenlu/AdaptiveAttention

husthuaan/AoANet

yashk2810/Image-Captioning

sethuiyer/Image-to-Image-Search

krasserm/fairseq-image-captioning

dabasajay/Image-Caption-Generator

aimagelab/show-control-and-tell

JDAI-CV/image-captioning

anuragmishracse/caption_generator

DataTurks/DataTurks

saahiluppal/catr

yxuansu/MAGIC

peteanderson80/Up-Down-Captioner