Qoboty

Pinned Repositories

3dmsr
3d model shape retrieval
Language:C++0 1 00
AI-metrics
An open source project to document AI progress through data.
Language:Jupyter Notebook0 1 00
alexa-sign-language-translator
A project to make Amazon Echo respond to sign language using your webcam
Language:JavaScript0 1 00
ambient-gan
Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"
Language:Python0 1 00
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python0 0 00
ancient-text-restoration
Restoring ancient text using deep learning: a case study on Greek epigraphy.
Language:Python0 1 00
apollo
An open autonomous driving platform
Language:C++0 1 00
ASR-decoder
it's ASR decoder and make graph project
Language:C++0 1 00
asr_preprocessing
Python implementation of pre-processing for End-to-End speech recognition
Language:Python0 1 00

Qoboty's Repositories

Qoboty/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python0 0 00
Qoboty/Bert-VITS2-ext
基于Bert-VITS2做的表情、动画测试
Language:Python0 0
Qoboty/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python0 0
Qoboty/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Language:Python0 0
Qoboty/CosyVoice
LLM based TTS model, providing inference/training/deployment full-stack ability.
Language:Python0 0
Qoboty/dclm
DataComp for Language Models
Language:HTML0 0
Qoboty/Edge-Punct-Casing
Language:Python0 0
Qoboty/ehmam
Language:Python0 0
Qoboty/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python0 0
Qoboty/fish-speech
Brand new TTS solution
Language:Python0 0
Qoboty/FLUX-Controlnet-Inpainting
Language:Python0 0
Qoboty/hertz-dev
first base model for full-duplex conversational audio
Language:Python0 0
Qoboty/HierSpeechpp
The official implementation of HierSpeech++
Language:Python0 0
Qoboty/Inpaint-Anything
Inpaint anything using Segment Anything and inpainting models.
Language:Jupyter Notebook0 0
Qoboty/LLM4ESGPrediction
Qoboty/LSLM-Listening-while-Speaking-Language-Model
LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue capabilities.
Language:Python0 0
Qoboty/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Language:Python0 0
Qoboty/moshi
Language:Python0 0
Qoboty/OpenVoice
Instant voice cloning by MyShell
Language:Python0 0
Qoboty/parler-tts
Inference and training library for high-quality TTS models.
Language:Python0 0
Qoboty/pflow-encodec
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
Qoboty/Pyramid-Flow
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Language:Python0 0
Qoboty/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Qoboty/TTS-xtts
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python0 0
Qoboty/ultravox
Qoboty/UMOE-Scaling-Unified-Multimodal-LLMs
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
Language:Python0 0
Qoboty/UniCATS-CTX-vec2wav
Code for CTX-vec2wav in UniCATS
Language:Python0 0
Qoboty/VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Language:Jupyter Notebook0 0
Qoboty/VoiceFlow-TTS
Language:Python0 0
Qoboty/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling