multimodality

There are 117 repositories under multimodality topic.

lucidrains/big-sleep
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun
Language:Python2.6k 46 87305
roboflow/multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥
Language:Python968 14 768
PreferredAI/cornac
A Comparative Framework for Multimodal Recommender Systems
Language:Python830 25 148134
ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language:Python794 12 109116
hymie122/RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
781 20 257
fnzhan/Generative-AI
[TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era
Language:TeX763 45 258
aimclub/FEDOT
Automated modeling and machine learning framework FEDOT
Language:Python615 10 53284
BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Language:Python581 10 853
BradyFU/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
Language:Python561 15 1128
jshilong/GPT4RoI
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Language:Python460 8 4224
afiaka87/clip-guided-diffusion
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Language:Python449 12 1662
zengyan-97/X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Language:Python435 5 3353
HazyResearch/fonduer
A knowledge base construction engine for richly formatted data
Language:Python403 27 17976
lium-lst/nmtpytorch
Sequence-to-Sequence Framework in PyTorch
Language:Jupyter Notebook392 17 2351
kyegomez/CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Language:Python336 21 1517
microsoft/UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Language:Python330 10 4454
OmicsML/dance
DANCE: a deep learning library and benchmark platform for single-cell analysis
Language:Python328 6 4130
soujanyaporia/multimodal-sentiment-analysis
Attention-based multimodal fusion for sentiment analysis
Language:Python312 7 1574
Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
278 10 119
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python273 4 2519
kyegomez/Med-PaLM
Towards Generalist Biomedical AI
Language:Python257 7 1536
Liang-ZX/VectorNet
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”
Language:Jupyter Notebook216 3 643
srvk/how2-dataset
This repository contains code and metadata of How2 dataset
Language:Python150 12 3017
florencejt/fusilli
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
Language:Python148 5 112
kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language:Python133 7 36
BiomedSciAI/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)
Language:Python128 11 6435
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Language:Python120 6 52
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language:HTML119 4 17
MMStar-Benchmark/MMStar
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language:Python115 1 81
senwu/emmental
A deep learning framework for building multimodal multi-task learning systems.
Language:Python106 11 1518
FoundationVision/GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Language:Python103 5 106
kyegomez/swarms-pytorch
Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊
Language:Python90 4 36
lucidrains/mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
Language:Python85 7 41
kyegomez/PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
Language:Python75 3 97
akashe/Multimodal-action-recognition
Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.
Language:Python70 1 613
amazon-science/gluonmm
A library of transformer models for computer vision and multi-modality research
Language:Python49 3 02

multimodality

lucidrains/big-sleep

roboflow/multimodal-maestro

PreferredAI/cornac

ArrowLuo/CLIP4Clip

hymie122/RAG-Survey

fnzhan/Generative-AI

aimclub/FEDOT

BAAI-Agents/Cradle

BradyFU/Woodpecker

jshilong/GPT4RoI

afiaka87/clip-guided-diffusion

zengyan-97/X-VLM

HazyResearch/fonduer

lium-lst/nmtpytorch

kyegomez/CM3Leon

microsoft/UniVL

OmicsML/dance

soujanyaporia/multimodal-sentiment-analysis

Yutong-Zhou-cv/Awesome-Multimodality

MMMU-Benchmark/MMMU

kyegomez/Med-PaLM

Liang-ZX/VectorNet

srvk/how2-dataset

florencejt/fusilli

kyegomez/NaViT

BiomedSciAI/fuse-med-ml

kyegomez/PALI3

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

MMStar-Benchmark/MMStar

senwu/emmental

FoundationVision/GenerateU

kyegomez/swarms-pytorch

lucidrains/mirasol-pytorch

kyegomez/PALI

akashe/Multimodal-action-recognition

amazon-science/gluonmm