zyjcsf

zyjcsf's Stars

microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python34.7k 342 2.7k4k
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Language:Shell14.1k 696 1.6k5.3k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python11.5k 201 2.2k2.4k
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python10k 132 49856
LargeWorldModel/LWM
Language:Python7.1k 66 71549
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python5.9k 56 1.1k644
open-mmlab/mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Language:Python4.3k 58 896743
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python4.1k 90 1k1.1k
young-geng/EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
Language:Python2.4k 42 88252
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
Language:Python2k 21 246198
NUS-HPC-AI-Lab/VideoSys
VideoSys: An easy and efficient system for video generation
Language:Python1.6k 26 70110
collabora/WhisperFusion
WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
Language:Python1.5k 17 37108
shaked6540/YoutubePlaylistDownloader
A tool to download whole playlists, channels or single videos from youtube and also optionally convert them to almost any format you would like
Language:C#1.5k 28 226238
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python1.2k 26 84114
k2-fsa/k2
FSA/FST algorithms, differentiable, with PyTorch compatibility.
Language:Cuda1.1k 77 378213
DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
629 89 442
DmitryRyumin/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Language:Python355 28 416
pyf98/DPHuBERT
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
Language:Python99 6 59
opendatakosovo/cyrillic-transliteration
Transliterate Cyrillic script to Latin script and vice versa.
Language:Python97 6 1627
RF5/transfusion-asr
Transcribing Speech with Multinomial Diffusion, training code and models.
Language:Python74 8 35
stevenhillis/awesome-asr-contextualization
A curated list of awesome papers on contextualizing E2E ASR outputs
72 2 29
mtkresearch/clairaudience
Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)
Language:Python25 4 21
juice500ml/xlm_to_xlsr
Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)
Language:Python12 3 24
thuhcsi/Contextual-Biasing-Dataset
open-source Mandarian biased word dataset
10 1 20
LeeYongHyeok/dual_cross_modality-AVSR
The audio visual speech recognition model which dual cross modality attention based on sigmedia-AVSR code
Language:Python1 1 00