kollobn's Stars
samuela/torch2jax
Run PyTorch in JAX. 🤝
nene1212/MaskGCT-Training
Training code for MaskGCT-T2S model.
DmitryRyumin/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
iwangjian/Paper-Reading-ConvAI
📖 Paper reading list in conversational AI (constantly updating 🤗).
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
SpeechColab/Leaderboard
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
modelscope/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
fishaudio/fish-speech
SOTA Open Source TTS
andysingal/llm-course
halsay/ASR-TTS-paper-daily
Update ASR paper everyday
Wataru-Nakata/miipher
Unofficial implementation of miipher
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
kaistmm/Audio-Mamba-AuM
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
yeyupiaoling/MASR
Pytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持Conformer、Squeezeformer、DeepSpeech2模型,支持多种数据增强方法。
s-nlp/transformers-course
Materials of transformers lecture course
huggingface/evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
NiuTrans/ABigSurveyOfLLMs
A collection of 150+ surveys on LLMs
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
e2b-dev/awesome-ai-agents
A list of AI autonomous agents
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
pytorch/torchtitan
A PyTorch native library for large model training
kyutai-labs/moshi
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
segment-any-text/wtpsplit
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
FireRedTeam/FireRedTTS
An Open-Sourced LLM-empowered Foundation TTS System
EmulationAI/awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
aiola-lab/whisper-medusa
Whisper with Medusa heads
Lordog/dive-into-llms
《动手学大模型Dive into LLMs》系列编程实践教程
karpathy/LLM101n
LLM101n: Let's build a Storyteller