xiangkanghuang

xiangkanghuang's Stars

yangdongchao/Open-Training-Moshi
The reproduce training process for Moshi
Language:Python524
kyutai-labs/moshi
Language:Python4k294
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
3.4k183
JaesungHuh/voice-gender-classifier
Voice gender classifier using ECAPA-TDNN
Language:Python72
Zeyi-Lin/HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
Language:Python9.9k945
yl4579/StyleTTS-ZS
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
962
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Language:Python658
FireRedTeam/FireRedTTS
Language:Python1124
3loi/NaturalVoices
Language:Jupyter Notebook413
audacity/audacity
Audio Editor
Language:C12.3k2.2k
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript46.1k6.5k
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion with in context learning
Language:Python22023
amphionspace/tts-evaluation
An evaluation set for large-scale trained TTS models (Coming in Sep 2024)
11
yangdongchao/SimpleSpeech
The open source code for SimpleSpeech series
Language:Python906
THUDM/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language:Python7.5k697
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python2.6k244
AI4Finance-Foundation/FinGPT
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
Language:Jupyter Notebook13.6k1.9k
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Language:Python1.8k145
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
Language:Python7.6k824
OpenBB-finance/OpenBB
Investment Research for Everyone, Everywhere.
Language:Python30.3k2.8k
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Language:Python17.4k1.7k
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3k321
sh-lee-prml/PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
1117
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python77238
KoeAI/LLVC
Language:Python37630
hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
Language:Python35.8k5.1k
TXH-mercury/VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Language:Python25515
Audio-WestlakeU/RealMAN
A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
Language:Python626
NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集：一个可视化浏览器自动化测试/数据采集/爬虫软件，可以无代码图形化的设计和执行爬虫任务。别名：ServiceWrapper面向Web应用的智能化服务封装系统。
Language:JavaScript34.4k4.2k
bghira/SimpleTuner
A general fine-tuning kit geared toward diffusion models.
Language:Python1.6k140