zyjcsf's Stars
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
LargeWorldModel/LWM
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
open-mmlab/mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
young-geng/EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
NUS-HPC-AI-Lab/VideoSys
VideoSys: An easy and efficient system for video generation
collabora/WhisperFusion
WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
shaked6540/YoutubePlaylistDownloader
A tool to download whole playlists, channels or single videos from youtube and also optionally convert them to almost any format you would like
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
k2-fsa/k2
FSA/FST algorithms, differentiable, with PyTorch compatibility.
DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
DmitryRyumin/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
pyf98/DPHuBERT
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
opendatakosovo/cyrillic-transliteration
Transliterate Cyrillic script to Latin script and vice versa.
RF5/transfusion-asr
Transcribing Speech with Multinomial Diffusion, training code and models.
stevenhillis/awesome-asr-contextualization
A curated list of awesome papers on contextualizing E2E ASR outputs
mtkresearch/clairaudience
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)
juice500ml/xlm_to_xlsr
Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)
thuhcsi/Contextual-Biasing-Dataset
open-source Mandarian biased word dataset
LeeYongHyeok/dual_cross_modality-AVSR
The audio visual speech recognition model which dual cross modality attention based on sigmedia-AVSR code