drock577's Stars
HumanSignal/label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
cvat-ai/cvat
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
voxel51/fiftyone
Refine high-quality datasets and visual AI models
TadasBaltrusaitis/OpenFace
OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
MrNeRF/awesome-3D-gaussian-splatting
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
autodistill/autodistill
Images to inference with no labeling (use foundation models to train supervised models).
nvkelso/natural-earth-vector
A global, public domain map dataset available at three scales and featuring tightly integrated vector and raster data.
Purfview/whisper-standalone-win
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
xiaobai1217/Awesome-Video-Datasets
Video datasets
NVlabs/InstantSplat
InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds
juanmc2005/diart
A python package to build AI-powered real-time audio applications
Rudrabha/Lip2Wav
This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
zenml-io/awesome-open-data-annotation
Open Source Data Annotation & Labeling Tools
pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
microsoft/XPretrain
Multi-modality pre-training
supervisely/supervisely
Supervisely SDK for Python - convenient way to automate, customize and extend Supervisely Platform for your computer vision task
CMU-MultiComp-Lab/CMU-MultimodalSDK
jimmy646/violin
Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"
cvqluu/simple_diarizer
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code
jssprz/video_captioning_datasets
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
DmZhukov/CrossTask
Chris10M/Lip2Speech
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
Zhao-Tian-yi/RSDet
yochaiye/LipVoicer
Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"
chahuja/pats
PATS Dataset. Aligned Pose-Audio-Transcripts and Style for co-speech gesture research
Jerrry-Li/YOLO-FIRI
JeongHun0716/Personalized-Lip-Reading
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language (AAAI 2025)
3loi/MSP_Face
jinchiniao/LSHUC
BMVC'23 Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading