xzm2004260

speech synthesis , TTS

Xiamen

Pinned Repositories

AByteOfNLP
some code for nlp tour
Language:Python00
AlignmentServer
API for alignment of singing voice to lyrics as used in www.voicemagix.com. Core Machine Learning Algorithms are MLP neural networks and hidden markov models. Based on Django Rest Framework
Language:Python10
Automatic_Speech_Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Language:Python00
awesome-music-informatics
A curated list of awesome article, tutorial, library, webpage, etc.
10
Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
Language:Python10
DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Language:Python10
FastImageProcessing
Fast Image Processing with Fully-Convolutional Networks
Language:Python10
GPUImage
An open source iOS framework for GPU-based image and video processing
Language:Objective-C00
marytts
MARY TTS -- an open-source, multilingual text-to-speech synthesis system written in pure java
Language:Java00
merlin
This is now the official location of the Merlin project.
Language:Python00

xzm2004260's Repositories

xzm2004260/Codec-SUPERB
Audio Codec Speech processing Universal PERformance Benchmark
Language:Python10
xzm2004260/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Language:Python10
xzm2004260/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
1
xzm2004260/agc
Audiogen Codec
xzm2004260/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python0 0
xzm2004260/audioFlux
A library for audio and music analysis, feature extraction.
xzm2004260/audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
xzm2004260/audiowmark
Audio Watermarking
xzm2004260/Automatic_Speech_Annotator
Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automatic speech recognition
xzm2004260/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
xzm2004260/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
xzm2004260/DiJiang
The official implementation of "DiJiang: Efficient Large Language Models through Compact Kernelization"
xzm2004260/IMS-Toucan
Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart. Objectives of the development are simplicity, modularity, controllability and multilinguality.
xzm2004260/LangSegment
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言（97种语言）混合文本内容自动分词工具。
xzm2004260/megatts2
Unoffical implementation of Megatts2
xzm2004260/metavoice-src
Foundational model for human-like, expressive TTS
xzm2004260/open-musiclm
Implementation of MusicLM, a text to music model published by Google Research, with a few modifications.
Language:Python0 0
xzm2004260/parler-tts
Inference and training library for high-quality TTS models.
xzm2004260/python-jyutping
Python 汉字到粤拼转换工具。
xzm2004260/RTNeural
Real-time neural network inferencing
xzm2004260/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
xzm2004260/so-vits-svc-fork
so-vits-svc fork with realtime support, improved interface and more features.
xzm2004260/sparse-vqvae
Experimental implementation for a sparse-dictionary based version of the VQ-VAE2 paper
xzm2004260/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
xzm2004260/supervoice-gpt
GPT-style network for phonemization with durations of text
xzm2004260/ttts
Train the next generation of TTS systems.
xzm2004260/USLM
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
xzm2004260/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
xzm2004260/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
xzm2004260/wavmark
AI-based Audio Watermarking Tool