Shuo-H's Stars
microsoft/AudioEntailment
Audio Entailment: Deductive Reasoning for Audio Understanding
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
mli/paper-reading
深度学习经典、新论文逐段精读
KonanAI/konanai
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Audio-AGI/AudioSep
Official implementation of "Separate Anything You Describe"
etzinis/unsup_speech_enh_adaptation
Unsupervised domain adaptation for conversational speech enhancement using RemixIT
konan-ai/konanai-archive
espeak-ng/espeak-ng
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
facebookresearch/ImageBind
ImageBind One Embedding Space to Bind Them All
MLSpeech/FormantsTracker
google-research/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
YunyangZeng/TAPLoss