roudimit's Stars
yangshun/tech-interview-handbook
💯 Curated coding interview preparation materials for busy software engineers
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
google-research/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
openai/gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
common-voice/common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
jitsi/jiwer
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
huggingface/community-events
Place where folks can contribute to 🤗 community events
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
facebookresearch/muavic
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Sally-SH/VSP-LLM
microsoft/Pengi
An Audio Language model for Audio Tasks
YuanGongND/cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
mpc001/auto_avsr
Auto-AVSR: Lip-Reading Sentences Project
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
SamsungLabs/SummaryMixing
This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
robinhad/kruk
Ukrainian instruction-tuned language models and datasets
HarunoriKawano/BEST-RQ
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
IDRnD/VoxTube
The VoxTube dataset official repository
ahaliassos/raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
YuanGongND/uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
Alexander-H-Liu/dinosr
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
DanielMengLiu/AudioVisualLip
roudimit/c2kd
Code for the C2KD paper (ICASSP 2023)
YasserdahouML/VSR_test_set
WildVSR