jefflai108
Ph.D. Student at MIT. Interested in self-supervised learning, spoken language acquisition, and audio-visual learning.
Cambridge, MA
jefflai108's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
CompVis/stable-diffusion
A latent text-to-image diffusion model
rclone/rclone
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
prasmussen/gdrive
Google Drive CLI Client
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
rosinality/vq-vae-2-pytorch
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
google-research/parti
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
NVlabs/GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
facebookresearch/mega
Sequence modeling with Mega.
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
yoyololicon/diffwave-sr
my-yy/s2v_rc
Speech2Vec Reality Check
lucidrains/n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
xinjli/alqalign
multilingual speech aligner
kylebgorman/syllabify
Python module for syllabifying English ARPABET transcriptions
desh2608/gmm-hmm-asr
Python implementation of simple GMM and HMM models for isolated digit recognition.
YuanGongND/uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
mhamilton723/DenseAV
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
lstrgar/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
huckiyang/awesome-neural-reprogramming-prompting
A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022
kamperh/vqwordseg
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
jasonppy/word-discovery
Word Discovery in Visually Grounded, Self-Supervised Speech Models
zhaoyanpeng/xcfg
X (weighted / probabilistic) Context-Free Grammars
zhaoyanpeng/cpcfg
Fast and Modularized CFG-focused Models
GATECH-EIC/S3-Router
[NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing" by Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin
lingjzhu/spoken_sent_embedding
Unsupervised spoken sentence embeddings
KaosEngineer/structured-uncertainty