jefflai108

Ph.D. Student at MIT. Interested in self-supervised learning, spoken language acquisition, and audio-visual learning.

Cambridge, MA

jefflai108's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python68.1k 571 08k
CompVis/stable-diffusion
A latent text-to-image diffusion model
Language:Jupyter Notebook67.7k 558 71010.1k
rclone/rclone
"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
Language:Go46.4k 577 5.5k4.2k
prasmussen/gdrive
Google Drive CLI Client
Language:Go9k 223 5941.2k
OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Language:Python5.7k 78 142371
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python4.5k 58 152383
rosinality/vq-vae-2-pytorch
Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Language:Python1.6k 20 77270
google-research/parti
1.5k 56 987
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language:Python992 25 4676
NVlabs/GroupViT
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
Language:Python724 11 6452
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python428 15 1339
facebookresearch/mega
Sequence modeling with Mega.
Language:Python297 126 1628
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Language:Python147 14 610
yoyololicon/diffwave-sr
Language:Jupyter Notebook78 5 88
my-yy/s2v_rc
Speech2Vec Reality Check
Language:Python75 2 13
lucidrains/n-grammer-pytorch
Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch
Language:Python72 7 31
xinjli/alqalign
multilingual speech aligner
Language:Python71 10 15
kylebgorman/syllabify
Python module for syllabifying English ARPABET transcriptions
Language:Python64 4 116
desh2608/gmm-hmm-asr
Python implementation of simple GMM and HMM models for isolated digit recognition.
Language:Python59 4 322
YuanGongND/uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
Language:Python55 2 43
mhamilton723/DenseAV
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Language:Jupyter Notebook53 3 39
lstrgar/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
Language:Python50 5 510
huckiyang/awesome-neural-reprogramming-prompting
A curated list of awesome adversarial reprogramming and input prompting methods for neural networks since 2022
Language:Python35 5 00
kamperh/vqwordseg
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
Language:Jupyter Notebook35 3 39
jasonppy/word-discovery
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Language:Jupyter Notebook24 4 57
zhaoyanpeng/xcfg
X (weighted / probabilistic) Context-Free Grammars
Language:Python24 2 12
zhaoyanpeng/cpcfg
Fast and Modularized CFG-focused Models
Language:Python23 2 11
GATECH-EIC/S3-Router
[NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing" by Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin
Language:Python14 4 12
lingjzhu/spoken_sent_embedding
Unsupervised spoken sentence embeddings
Language:Python14 2 00
KaosEngineer/structured-uncertainty
Language:Python10 3 01

jefflai108

jefflai108's Stars

openai/whisper

CompVis/stable-diffusion

rclone/rclone

prasmussen/gdrive

OpenGVLab/LLaMA-Adapter

open-mmlab/Amphion

rosinality/vq-vae-2-pytorch

google-research/parti

bytedance/SALMONN

NVlabs/GroupViT

ZhangXInFD/SpeechTokenizer

facebookresearch/mega

mct10/RepCodec

yoyololicon/diffwave-sr

my-yy/s2v_rc

lucidrains/n-grammer-pytorch

xinjli/alqalign

kylebgorman/syllabify

desh2608/gmm-hmm-asr

YuanGongND/uavm

mhamilton723/DenseAV

lstrgar/self-supervised-phone-segmentation

huckiyang/awesome-neural-reprogramming-prompting

kamperh/vqwordseg

jasonppy/word-discovery

zhaoyanpeng/xcfg

zhaoyanpeng/cpcfg

GATECH-EIC/S3-Router

lingjzhu/spoken_sent_embedding

KaosEngineer/structured-uncertainty