forwiat

Pinned Repositories

3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Language:Python00
acad-homepage.github.io
AcadHomepage: A Modern and Responsive Academic Personal Homepage
Language:SCSS00
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Language:Python0 0 00
academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript0 1 00
AdaIN-VC
An unofficial implementation of the paper "One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization".
Language:Python0 1 00
AdaSpeech2
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Language:Jupyter Notebook0 1 00
ICASSP-2023-Papers
ICASSP 2023 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023 conference. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
1 0 00
mandarin_tts
This is an implement of mandarin statistical parametric speech synthesis
Language:Python2 2 10
music_recognition
implement of music type recognition in tensorflow
Language:Python1 2 00
pytorch-multi-gpu-training
整理 pytorch 单机多 GPU 训练方法与原理
Language:Python3 1 01

forwiat's Repositories

forwiat/Awesome-Audio-LLM
Audio Large Language Models
forwiat/Baichuan-Audio
forwiat/bailing
百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，时延低至800ms，低配置也可运行，支持打断
forwiat/ChatTTSPlus
Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment
forwiat/CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体！
forwiat/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
forwiat/deep-cross-attention
Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch
forwiat/focalcodec
A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
forwiat/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
forwiat/hertz-dev
first base model for full-duplex conversational audio
forwiat/kokoro-tts
A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.
forwiat/ltp
Language Technology Platform
forwiat/MaskGCT-Training
Training code for MaskGCT-T2S model.
forwiat/MiniCPM-o
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
forwiat/MoneyPrinterV2
Automate the process of making money online.
forwiat/native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
forwiat/OuteTTS
Interface for OuteTTS models.
forwiat/preference-flow-matching
Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)
forwiat/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
forwiat/sheet
Speech Human Evaluation Estimation Toolkit (SHEET)
forwiat/tokensynth
The official implementation of TokenSynth (ICASSP 2025)
forwiat/vec2wav2.0
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
forwiat/versa
Versatile Evaluation of Speech and Audio
forwiat/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
forwiat/VoiceBench
VoiceBench: Benchmarking LLM-Based Voice Assistants
forwiat/vscode-remote-release
Visual Studio Code Remote Development: Open any folder in WSL, in a Docker container, or on a remote machine using SSH and take advantage of VS Code's full feature set.
forwiat/WavChat
A Survey of Spoken Dialogue Models (60 pages)
forwiat/weekly
科技爱好者周刊，每周五发布
forwiat/WeTextProcessing
Text Normalization & Inverse Text Normalization
forwiat/X-Codec-2.0
Codec for paper: LLaSA: Scaling Train Time and Test Time Compute for LLaMA based Speech Synthesis.