Fuann's Stars
state-spaces/mamba
Mamba SSM architecture
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
s3prl/s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
stanfordnlp/pyreft
ReFT: Representation Finetuning for Language Models
yousinix/portfolYOU
A beautiful portfolio Jekyll theme that works with GitHub Pages.
k2-fsa/icefall
DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
santi-pdp/pase
Problem Agnostic Speech Encoder
jonatasgrosman/huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
carlthome/python-audio-effects
Apply audio effects such as reverb and EQ directly to audio files or NumPy ndarrays.
oliverguhr/wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
PolyAI-LDN/pheme
marekrei/sequence-labeler
Neural network sequence labeling model
YuanGongND/gopt
Code for the ICASSP 2022 paper "Transformer-Based Multi-Aspect Multi-Granularity Non-native English Speaker Pronunciation Assessment".
JusperLee/SPMamba
ga642381/SpeechPrompt
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm
aleXiehta/PhoneFortifiedPerceptualLoss
Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement
lstrgar/self-supervised-phone-segmentation
Phoneme segmentation using pre-trained speech models
archiki/Robust-E2E-ASR
This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 2021.
articulatory/articulatory
Deep Articulatory Synthesis and Inversion
JazminVidal/gop-dnn-epadb
Goodness of Pronunciation using Kaldi on Epa-DB database
Observeai-Research/Phoneme-BERT
JuanPZuluaga/accent-recog-slt2022
Repository for Accent Recognition (Hackathon @SLT2022)
juice500ml/dysarthria-gop
doheejin/SB_loss_PA
This repository is the implementation of the paper, "Score-balanced Loss for Multi-aspect Pronunciation Assessment" (Interspeech 2023).
hcraighead/automated-english-transcription-grader
Investigating the effect of auxiliary objectives for the automated grading of learner English speech transcriptions (ACL 2020)