multimodal-datasets

There are 21 repositories under multimodal-datasets topic.

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook10.9k 96 7001.1k
remyxai/VQASynth
Compose multimodal datasets 🎹
Language:Python475 7 2220
drmuskangarg/Multimodal-datasets
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information about recent multimodal datasets which are available for research purposes. We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.
283 2 124
AnkurDeria/MFT
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
Language:Jupyter Notebook219 4 1423
wisdomikezogwo/quilt1m
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Language:Python168 4 338
yuanxiaosc/Multimodal-short-video-dataset-and-baseline-classification-model
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型（TensorFlow2.0）。
Language:Jupyter Notebook128 3 136
marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
80 8 07
roboflow/rf100-vl
Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"
Language:Python805
piresramon/gpt-4-enem
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
Language:Python46 3 210
Yuco-Z/Awesome-Multi-Modal-Dialog
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
38 2 14
JunweiLiang/FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Language:Python32 3 215
OlehOnyshchak/pyWikiMM
Collects a multimodal dataset of Wikipedia articles and their images
Language:Python16 1 112
ddw2AIGROUP2CQUPT/Large-Scale-Multimodal-Face-Datasets
Millions-Level Face/Human-Scene Image-Text Datasets
140
deepmancer/vlm-toolbox
Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation
Language:Jupyter Notebook11 1 02
lujiaying/MUG-Bench
Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Language:Python9 3 121
NUSTM/EMDRC
Towards Explainable Multimodal Depression Recognition for Clinical Interviews
8 2 0
gcunhase/AnnotatedMV-PreProcessing
Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)
Language:Python5 1 1
clp-research/language-models-multimodal-tasks
Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Association for Computational Linguistics (ACL 2023 Findings)"
Language:Python3 1 11
GeorgeTouros/video-soundtrack-evaluation
Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
Language:Jupyter Notebook3 3 00
OlehOnyshchak/WikiImageRecommendation
Image Recommendation for Wikipedia Articles
Language:Jupyter Notebook3 1 110
Damorgal/Multimodal-Research-experiments
All experiments were done to classify multimodal data.
2 1 01

multimodal-datasets

salesforce/LAVIS

remyxai/VQASynth

drmuskangarg/Multimodal-datasets

AnkurDeria/MFT

wisdomikezogwo/quilt1m

yuanxiaosc/Multimodal-short-video-dataset-and-baseline-classification-model

marslanm/Multimodality-Representation-Learning

roboflow/rf100-vl

piresramon/gpt-4-enem

Yuco-Z/Awesome-Multi-Modal-Dialog

JunweiLiang/FVTA_MemexQA

OlehOnyshchak/pyWikiMM

ddw2AIGROUP2CQUPT/Large-Scale-Multimodal-Face-Datasets

deepmancer/vlm-toolbox

lujiaying/MUG-Bench

NUSTM/EMDRC

gcunhase/AnnotatedMV-PreProcessing

clp-research/language-models-multimodal-tasks

GeorgeTouros/video-soundtrack-evaluation

OlehOnyshchak/WikiImageRecommendation

Damorgal/Multimodal-Research-experiments