oyly16's Stars
declare-lab/MELD
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
YebowenHu/MeetingBank-utils
yt-dlp/yt-dlp
A feature-rich command-line audio/video downloader
meta-llama/llama3
The official Meta Llama 3 GitHub site
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile
IDEA-Research/OpenSeeD
[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"
aistairc/FineBio
Data and code for the paper "FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation"
psychoinformatics-de/remodnav
Robust Eye Movement Detection for Natural Viewing
ut-vision/S2DHand
ut-vision/ActionVOS
[ECCV 2024 Oral] ActionVOS: Actions as Prompts for Video Object Segmentation
unslothai/unsloth
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
ErikEkstedt/TurnGPT
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
sangmin-git/MMSI
Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
ollama/ollama
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
facebookresearch/VMZ
VMZ: Model Zoo for Video Modeling
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
thuiar/MIntRec
MIntRec: A New Dataset for Multimodal Intent Recognition (ACM MM 2022)
mehdifatan/Egocom-IRI-UPC
ai4r/AIR-Act2Act
SALT-NLP/PersuationGames
[ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduction Games"
relh/moves
[CVPR 2023] MOVES: Manipulated Objects in Video Enable Segmentation
postech-ami/SMILE-Dataset
[NAACL'24] Repository for "SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models"
jayleicn/VideoLanguageFuturePred
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
sotopia-lab/sotopia
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
0nutation/SpeechAgents
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
facebookresearch/EgoCom-Dataset
EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset
facebookresearch/EasyComDataset
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the *cocktail party effect* from an augmented-reality (AR) -motivated multi-sensor egocentric world view.
CHATS-lab/KokoMind
KokoMind: Can LLMs Understand Social Interactions?