roudimit

PhD Student at MIT CSAIL

Massachusetts Institute of TechnologyCambridge, Massachusetts

roudimit's Stars

yangshun/tech-interview-handbook
💯 Curated coding interview preparation materials for busy software engineers
Language:TypeScript120k 2.1k 10514.8k
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Language:Python38.1k 380 3206.1k
google-research/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
27.6k 291 432.3k
openai/gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
Language:Python22.7k 631 2685.5k
huggingface/datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Language:Python19.4k 277 3k2.7k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook11k 144 3701.1k
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language:Python3.8k 48 176287
common-voice/common-voice
Common Voice is part of Mozilla's initiative to help teach machines how real people speak.
Language:TypeScript3.3k 133 2.3k843
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Language:Python986 14 5764
jitsi/jiwer
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Language:Python660 15 49100
DmitryRyumin/INTERSPEECH-2023-Papers
INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
616 87 442
huggingface/community-events
Place where folks can contribute to 🤗 community events
Language:Jupyter Notebook405 52 3297
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Language:Python399 15 5338
facebookresearch/muavic
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Language:Python370 13 2332
Sally-SH/VSP-LLM
Language:Python303 7 625
microsoft/Pengi
An Audio Language model for Audio Tasks
Language:Python297 14 1416
YuanGongND/cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Language:Python244 5 2923
mpc001/auto_avsr
Auto-AVSR: Lip-Reading Sentences Project
Language:Python190 5 3841
common-voice/cv-dataset
Metadata and versioning details for the Common Voice dataset
Language:JavaScript143 18 2715
SamsungLabs/SummaryMixing
This repository implements SummaryMixing, a simpler, faster and much cheaper replacement to self-attention for automatic speech recognition (see: https://arxiv.org/abs/2307.07421). The code is ready to be used with the SpeechBrain toolkit).
Language:Python113 10 311
robinhad/kruk
Ukrainian instruction-tuned language models and datasets
Language:Jupyter Notebook87 4 68
HarunoriKawano/BEST-RQ
Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.
Language:Python62 5 34
IDRnD/VoxTube
The VoxTube dataset official repository
Language:HTML61 5 41
ahaliassos/raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
Language:Python59 9 95
YuanGongND/uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Language:Python54 2 43
roger-tseng/av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
Language:Python50 2 44
Alexander-H-Liu/dinosr
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Language:Python47 4 34
DanielMengLiu/AudioVisualLip
Language:Python20 1 11
roudimit/c2kd
Code for the C2KD paper (ICASSP 2023)
Language:Python17 1 01
YasserdahouML/VSR_test_set
WildVSR
Language:Python15 1 40

roudimit

roudimit's Stars

yangshun/tech-interview-handbook

karpathy/nanoGPT

google-research/tuning_playbook

openai/gpt-2

huggingface/datasets

facebookresearch/seamless_communication

mlfoundations/open_flamingo

common-voice/common-voice

OFA-Sys/ONE-PEACE

jitsi/jiwer

DmitryRyumin/INTERSPEECH-2023-Papers

huggingface/community-events

YuanGongND/ltu

facebookresearch/muavic

Sally-SH/VSP-LLM

microsoft/Pengi

YuanGongND/cav-mae

mpc001/auto_avsr

common-voice/cv-dataset

SamsungLabs/SummaryMixing

robinhad/kruk

HarunoriKawano/BEST-RQ

IDRnD/VoxTube

ahaliassos/raven

YuanGongND/uavm

roger-tseng/av-superb

Alexander-H-Liu/dinosr

DanielMengLiu/AudioVisualLip

roudimit/c2kd

YasserdahouML/VSR_test_set