xiangkanghuang's Stars
yangdongchao/Open-Training-Moshi
The reproduce training process for Moshi
kyutai-labs/moshi
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
JaesungHuh/voice-gender-classifier
Voice gender classifier using ECAPA-TDNN
Zeyi-Lin/HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
yl4579/StyleTTS-ZS
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
FireRedTeam/FireRedTTS
3loi/NaturalVoices
audacity/audacity
Audio Editor
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Plachtaa/seed-vc
zero-shot voice conversion & singing voice conversion with in context learning
amphionspace/tts-evaluation
An evaluation set for large-scale trained TTS models (Coming in Sep 2024)
yangdongchao/SimpleSpeech
The open source code for SimpleSpeech series
THUDM/CogVideo
Text-to-video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
AI4Finance-Foundation/FinGPT
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
OpenBB-finance/OpenBB
Investment Research for Everyone, Everywhere.
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
sh-lee-prml/PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
KoeAI/LLVC
hacksider/Deep-Live-Cam
real time face swap and one-click video deepfake with only a single image
TXH-mercury/VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Audio-WestlakeU/RealMAN
A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
NaiboWang/EasySpider
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
bghira/SimpleTuner
A general fine-tuning kit geared toward diffusion models.