multimodality

There are 117 repositories under multimodality topic.

  • lucidrains/big-sleep

    A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

    Language:Python2.6k4687305
  • roboflow/multimodal-maestro

    Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

    Language:Python96814768
  • PreferredAI/cornac

    A Comparative Framework for Multimodal Recommender Systems

    Language:Python83025148134
  • ArrowLuo/CLIP4Clip

    An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

    Language:Python79412109116
  • hymie122/RAG-Survey

    Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".

  • fnzhan/Generative-AI

    [TPAMI 2023] Multimodal Image Synthesis and Editing: The Generative AI Era

    Language:TeX76345258
  • FEDOT

    aimclub/FEDOT

    Automated modeling and machine learning framework FEDOT

    Language:Python6151053284
  • BAAI-Agents/Cradle

    The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

    Language:Python58110853
  • BradyFU/Woodpecker

    ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

    Language:Python561151128
  • jshilong/GPT4RoI

    GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

    Language:Python46084224
  • afiaka87/clip-guided-diffusion

    A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.

    Language:Python449121662
  • zengyan-97/X-VLM

    X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)

    Language:Python43553353
  • fonduer

    HazyResearch/fonduer

    A knowledge base construction engine for richly formatted data

    Language:Python4032717976
  • lium-lst/nmtpytorch

    Sequence-to-Sequence Framework in PyTorch

    Language:Jupyter Notebook392172351
  • kyegomez/CM3Leon

    An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images

    Language:Python336211517
  • microsoft/UniVL

    An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

    Language:Python330104454
  • OmicsML/dance

    DANCE: a deep learning library and benchmark platform for single-cell analysis

    Language:Python32864130
  • soujanyaporia/multimodal-sentiment-analysis

    Attention-based multimodal fusion for sentiment analysis

    Language:Python31271574
  • Yutong-Zhou-cv/Awesome-Multimodality

    A Survey on multimodal learning research.

  • MMMU-Benchmark/MMMU

    This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

    Language:Python27342519
  • kyegomez/Med-PaLM

    Towards Generalist Biomedical AI

    Language:Python25771536
  • Liang-ZX/VectorNet

    Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

    Language:Jupyter Notebook2163643
  • srvk/how2-dataset

    This repository contains code and metadata of How2 dataset

    Language:Python150123017
  • florencejt/fusilli

    A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

    Language:Python1485112
  • kyegomez/NaViT

    My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

    Language:Python133736
  • BiomedSciAI/fuse-med-ml

    A python framework accelerating ML based discovery in the medical field by encouraging code reuse. Batteries included :)

    Language:Python128116435
  • kyegomez/PALI3

    Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

    Language:Python120652
  • YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

    🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

    Language:HTML119417
  • MMStar-Benchmark/MMStar

    This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

    Language:Python115181
  • emmental

    senwu/emmental

    A deep learning framework for building multimodal multi-task learning systems.

    Language:Python106111518
  • FoundationVision/GenerateU

    [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

    Language:Python1035106
  • kyegomez/swarms-pytorch

    Swarming algorithms like PSO, Ant Colony, Sakana, and more in PyTorch 😊

    Language:Python90436
  • lucidrains/mirasol-pytorch

    Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch

    Language:Python85741
  • kyegomez/PALI

    Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"

    Language:Python75397
  • akashe/Multimodal-action-recognition

    Code on selecting an action based on multimodal inputs. Here in this case inputs are voice and text.

    Language:Python701613
  • amazon-science/gluonmm

    A library of transformer models for computer vision and multi-modality research

    Language:Python49302